Decoupled AI Architecture in Developer Platforms

AI Architecture MCP Golang Gemini Azure OpenAI

When building sophisticated Internal Developer Platforms (IDP), embedding an interactive AI Chat interface is no longer a luxury—it's an expectation. In my recent work building vcollab, the true architectural challenge was not simply querying an LLM API. The core problem was maintaining an absolute separation of concerns.

How do we expose highly sensitive internal microservice tools—like fetching deployment matrices, managing developer workspaces, querying resource limits—without hardcoding rigid schemas for OpenAI or Gemini? Furthermore, how do we implement this without turning our chat facade into an insecure monolith?

The solution is a strict Decoupled Architecture utilizing the Model Context Protocol (MCP).

The Architecture Deep-Dive

Instead of cramming API keys and microservice database coordinates directly into a chat portal, our architecture was split into two physically distinct Golang binaries communicating dynamically. This established strict zero-trust boundaries.

1. Component A: The AI Chat Client (Orchestrator)

The ai_chat_client serves as the central brain binding the user, the AI provider, and the underlying platform context. Built on the lightweight Gin framework in Go, it features a unique execution flow.

Provider Agnosticism via Dynamic Adapters: By utilizing a pure Provider Interface paradigm, traffic routes seamlessly via google.golang.org/genai for Gemini 3 Flash, or via github.com/Azure/azure-sdk-for-go/sdk/ai/azopenai for GPT-4. This is a single environment variable swap (VCOL_AI_PROVIDER), preventing structural lock-in.
The ReAct Agent Loop: The core logic relies on a loop (Think → Act → Observe → Respond). When an prompt arrives from the React frontend, the orchestrator retrieves the latest tools from the MCP server, queries the AI, catches any tool invocations, dynamically triggers the MCP tool execution, feeds the observer response back into the LLM, and streams the final synthesis down to the user via Server-Sent Events (SSE).
Safeguards & RBAC Defenses: Before a prompt is even touched by the LLM, the system subjects the input to aggressive prompt injection and context sanitization heuristics.

2. Component B: The MCP Server (Platform Gateway)

The mcp_server runs entirely insulated. It establishes a Server-Sent Events (SSE) transport using the official github.com/modelcontextprotocol/go-sdk. It exposes functional tools (e.g., create_workspace, list_tshirt_sizes, list_builds), but it knows absolutely nothing about LLMs or their proprietary schemas.

Dynamic Tool Broadcasting: Rather than updating our Chat orchestrator each time our microservices change, the MCP Server broadcasts available platform tools via JSON-RPC. The AI client ingests these schemas and maps them dynamically for Gemini or Azure.
Encapsulating Business Rules: A critical design decision was avoiding direct database bindings. For example, instead of allowing an AI direct SQL access to query "T-Shirt Sizes" (resource tiers for workspaces), the MCP server triggers a secure REST call to the Workspace Microservice. This guarantees the AI inherits all deployment-specific filtering and data residency safeguards baked into standard APIs.

The Auth Chain Paradox: Solving Privilege Escalation

If you give an AI Agent the keys to spin up cloud infrastructure or interact with internal databases, identity spoofing becomes the primary vector of attack. How do we ensure "Bot X" doesn't act on behalf of a hyper-privileged orchestrator token?

We solved this securely using User JWT Forwarding. The orchestration is as follows:

The user authenticates heavily at the edge, communicating with the React UI.
The React UI transmits the user's encoded JWT metadata via the initial HTTP chatter to the internal ai_chat_client.
When the LLM hallucinates, or genuinely wishes to execute a tool, the ai_chat_client refuses to use a god-account. Instead, it passes that exact user JWT through the MCP SDK context headers.
The mcp_server receives the execution request, parses the JWT, logs the audit scope, and relays the REST payload onward (to the Template MS or Workspace MS).

By enforcing this, if an intern requests to build a Production EKS cluster, the workspace microservice rejects the token—just as it would if they clicked the button manually. The AI model operates with zero privilege escalation.

Quantifiable Improvements & Velocity

Transitioning from a monolithic AI web hook to a strictly isolated MCP system delivered vast measurable improvements to our platform:

System Attribute	Traditional AI Monolithic Chatbot	Decoupled MCP Go Architecture
Vendor Lock-In	High (Hardcoded JSON tool calls, native SDK bindings)	None (MCP dynamically translates JSON-RPC to any schema)
Security Auditing	Messy (Service acts via god-account API Keys)	Immutable (True End-to-End User JWT Forwarding)
Microservices Disruption	Requires API refactoring for "LLM-capable" responses	Zero changes required (MCP proxies to existing standard REST endpoints)
UI Responsiveness	Stale blocks while waiting for full generation	Real-time chunked streaming via WebSockets/SSE on React

The Future of Enterprise LLMs

Building generative capabilities correctly isn't about using the newest, flashiest LLM model tomorrow morning—it's about building the connective tissue so that your enterprise platform doesn't collapse under the weight of AI vendor drift.

By defining clear boundary layers, injecting standardized JSON-RPC routing through the Model Context Protocol, and rigorously forwarding standard identity tokens, we've essentially constructed a plug-and-play architecture. We could swap the AI backbone from Gemini to Anthropic tomorrow, and the underlying cloud ecosystem wouldn't skip a beat.

Ready to lock down your new Multi-Agent architecture? Check out the next part of this series: Well-Architected Multi-Agent AI Systems in Google Cloud.

Interested in integrating MCP or avoiding AI vendor lock-in in your platform engineering workflows? Get in touch or connect with me on LinkedIn.