When building sophisticated Internal Developer Platforms (IDP), embedding an interactive AI Chat
interface is no longer a luxury—it's an expectation. In my recent work building
vcollab, the true architectural challenge was not simply querying an LLM API. The core
problem was maintaining an absolute separation of concerns.
How do we expose highly sensitive internal microservice tools—like fetching deployment matrices, managing developer workspaces, querying resource limits—without hardcoding rigid schemas for OpenAI or Gemini? Furthermore, how do we implement this without turning our chat facade into an insecure monolith?
The solution is a strict Decoupled Architecture utilizing the Model Context Protocol (MCP).
The Architecture Deep-Dive
Instead of cramming API keys and microservice database coordinates directly into a chat portal, our architecture was split into two physically distinct Golang binaries communicating dynamically. This established strict zero-trust boundaries.
(Browser)"] -->|JWT + WS/SSE| CC["AI Chat Client
:8090
(Gin / Go)"] CC -->|Discover + Execute| MCP["MCP Server
:8091
(Go SDK)"] CC -->|ReAct Chat API| AI["Gemini / Azure"] AI -->|Tool Invocation| CC MCP -->|REST / gRPC| WS["Workspace MS
:8080"] MCP -->|REST / gRPC| TP["Template MS
:8082"]
1. Component A: The AI Chat Client (Orchestrator)
The ai_chat_client serves as the central brain binding the user, the AI provider, and
the underlying platform context. Built on the lightweight Gin framework in Go, it features a unique
execution flow.
- Provider Agnosticism via Dynamic Adapters: By utilizing a pure Provider
Interface paradigm, traffic routes seamlessly via
google.golang.org/genaifor Gemini 3 Flash, or viagithub.com/Azure/azure-sdk-for-go/sdk/ai/azopenaifor GPT-4. This is a single environment variable swap (VCOL_AI_PROVIDER), preventing structural lock-in. - The ReAct Agent Loop: The core logic relies on a loop (Think → Act → Observe → Respond). When an prompt arrives from the React frontend, the orchestrator retrieves the latest tools from the MCP server, queries the AI, catches any tool invocations, dynamically triggers the MCP tool execution, feeds the observer response back into the LLM, and streams the final synthesis down to the user via Server-Sent Events (SSE).
- Safeguards & RBAC Defenses: Before a prompt is even touched by the LLM, the system subjects the input to aggressive prompt injection and context sanitization heuristics.
2. Component B: The MCP Server (Platform Gateway)
The mcp_server runs entirely insulated. It establishes a Server-Sent Events (SSE)
transport using the official github.com/modelcontextprotocol/go-sdk. It exposes
functional tools (e.g., create_workspace, list_tshirt_sizes,
list_builds), but it knows absolutely nothing about LLMs or their proprietary
schemas.
- Dynamic Tool Broadcasting: Rather than updating our Chat orchestrator each time our microservices change, the MCP Server broadcasts available platform tools via JSON-RPC. The AI client ingests these schemas and maps them dynamically for Gemini or Azure.
- Encapsulating Business Rules: A critical design decision was avoiding direct database bindings. For example, instead of allowing an AI direct SQL access to query "T-Shirt Sizes" (resource tiers for workspaces), the MCP server triggers a secure REST call to the Workspace Microservice. This guarantees the AI inherits all deployment-specific filtering and data residency safeguards baked into standard APIs.
The Auth Chain Paradox: Solving Privilege Escalation
If you give an AI Agent the keys to spin up cloud infrastructure or interact with internal databases, identity spoofing becomes the primary vector of attack. How do we ensure "Bot X" doesn't act on behalf of a hyper-privileged orchestrator token?
We solved this securely using User JWT Forwarding. The orchestration is as follows:
- The user authenticates heavily at the edge, communicating with the React UI.
- The React UI transmits the user's encoded JWT metadata via the initial HTTP chatter to the
internal
ai_chat_client. - When the LLM hallucinates, or genuinely wishes to execute a tool, the
ai_chat_clientrefuses to use a god-account. Instead, it passes that exact user JWT through the MCP SDK context headers. - The
mcp_serverreceives the execution request, parses the JWT, logs the audit scope, and relays the REST payload onward (to the Template MS or Workspace MS).
By enforcing this, if an intern requests to build a Production EKS cluster, the workspace microservice rejects the token—just as it would if they clicked the button manually. The AI model operates with zero privilege escalation.
Quantifiable Improvements & Velocity
Transitioning from a monolithic AI web hook to a strictly isolated MCP system delivered vast measurable improvements to our platform:
| System Attribute | Traditional AI Monolithic Chatbot | Decoupled MCP Go Architecture |
|---|---|---|
| Vendor Lock-In | High (Hardcoded JSON tool calls, native SDK bindings) | None (MCP dynamically translates JSON-RPC to any schema) |
| Security Auditing | Messy (Service acts via god-account API Keys) | Immutable (True End-to-End User JWT Forwarding) |
| Microservices Disruption | Requires API refactoring for "LLM-capable" responses | Zero changes required (MCP proxies to existing standard REST endpoints) |
| UI Responsiveness | Stale blocks while waiting for full generation | Real-time chunked streaming via WebSockets/SSE on React |
The Future of Enterprise LLMs
Building generative capabilities correctly isn't about using the newest, flashiest LLM model tomorrow morning—it's about building the connective tissue so that your enterprise platform doesn't collapse under the weight of AI vendor drift.
By defining clear boundary layers, injecting standardized JSON-RPC routing through the Model Context Protocol, and rigorously forwarding standard identity tokens, we've essentially constructed a plug-and-play architecture. We could swap the AI backbone from Gemini to Anthropic tomorrow, and the underlying cloud ecosystem wouldn't skip a beat.
Ready to lock down your new Multi-Agent architecture? Check out the next part of this series: Well-Architected Multi-Agent AI Systems in Google Cloud.
Interested in integrating MCP or avoiding AI vendor lock-in in your platform engineering workflows? Get in touch or connect with me on LinkedIn.