MCP Is the New API Layer Nobody Is Designing Correctly
Why MCP servers should be thin, reliable protocol adapters - not application logic hosts
The Model Context Protocol is generating a lot of energy right now.
Every team with an API is building an MCP server. Every AI assistant integration now has "MCP support" on the roadmap. The ecosystem is moving fast.
And most of the implementations being built are going to cause problems.
Not because MCP is hard. Because teams are making the same mistake with MCP that teams made with GraphQL, with gRPC, with every new protocol layer that appeared before it: they're putting too much logic in the protocol adapter and not enough in the underlying API.
An MCP server should not know how memory storage works. It should not have its own error handling logic. It should not make authorization decisions. It should be a thin translation layer between the MCP protocol and your existing API — nothing more.
The teams that get this right will have MCP servers that are easy to maintain, easy to upgrade, and easy to secure. The teams that get it wrong will have MCP servers that are coupled to application logic, difficult to version, and full of security surface they didn't intend to create.
This post is about getting it right.
I – What an MCP Server Is
MCP (Model Context Protocol) is a protocol that lets LLM clients (Claude, Cursor, Continue, etc.) call tools and access resources defined by servers you build and run.
The architecture is simple:
LLM Client (Claude Desktop, Cursor)
|
| MCP protocol (stdio or HTTP)
v
MCP Server (your code)
|
| HTTP calls
v
Your API (your existing, tested, production API)
The MCP server is the middle layer. It translates between the MCP protocol (tool calls with JSON arguments) and your API (HTTP endpoints with authentication).
That's the entire job. Translation. Nothing else.
The MCP server should have no business logic. It should not implement memory storage. It should not make authorization decisions. It should have no database connections. Every piece of real logic lives in your API, where it is tested, versioned, and secured.
If your MCP server has more than a few hundred lines of code, something is wrong.
II – Tool Contract Design
An MCP server exposes tools. Tools are functions that the LLM can call. Design them carefully because they're part of your API contract.
The canonical tool set for a memory system:
{
"tools": [
{
"name": "remember",
"description": "Store a piece of information for later recall.",
"inputSchema": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "The information to store"},
"importance": {"type": "string", "enum": ["low", "medium", "high"]}
},
"required": ["content"]
}
},
{
"name": "recall",
"description": "Retrieve stored information relevant to a query.",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to search for"},
"limit": {"type": "integer", "default": 5}
},
"required": ["query"]
}
},
{
"name": "context",
"description": "Get a pre-assembled context pack for a given topic.",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
]
}
Tool names are verbs. Descriptions are precise — they tell the model exactly when to use this tool and what it does. Input schemas are strict — use required arrays, enum values where possible, and descriptive field names.
The descriptions matter more than most engineers think. The LLM uses them to decide which tool to call and how to call it. A vague description produces vague tool usage. "Store information" is not a useful description. "Store a piece of information — a fact, preference, or note — for retrieval in future conversations" is.
III – Transport: stdio vs HTTP
Two transport choices for MCP servers.
stdio (standard input/output). The MCP client launches your server as a subprocess. Communication happens over stdin/stdout. This is the right choice for local MCP servers running on the developer's machine.
Benefits: simple to implement, no networking, no auth management at the transport layer, natural process isolation.
Limitations: one client per server instance, no persistent connections, process startup latency on every session.
HTTP (Streamable HTTP or SSE). The MCP server runs as an HTTP service. Multiple clients can connect simultaneously. This is the right choice for hosted or shared MCP servers.
Benefits: shareable, no client-side process management, stateful connections possible.
Limitations: requires network, requires authentication at the transport layer, more infrastructure to operate.
For a SaaS product offering MCP integration: run both. A hosted HTTP server for users who want zero-configuration setup. A local stdio option for developers who want to run the MCP server on their own machine with their own API key.
IV – Authentication and Project Context
The MCP server needs credentials to call your API. Those credentials come from the environment — never from the MCP client itself.
For stdio transport:
// Claude Desktop config
{
"mcpServers": {
"ctxvault": {
"command": "ctxvault-mcp",
"env": {
"CTXVAULT_API_KEY": "cvk_live_...",
"CTXVAULT_PROJECT": "proj_abc123"
}
}
}
}
The API key is in the environment. The MCP server reads it on startup. It is never passed as a tool argument. It is never logged. It is never included in the MCP protocol messages.
The failure mode: an MCP server that accepts the API key as a tool parameter. Now the key appears in the LLM's context (it was passed as an argument). The key appears in conversation logs. The key may be included in prompts sent to the model provider. This is credential leakage in the worst possible direction.
Credentials are environment variables. Not arguments. Not config files included in the MCP response. Environment variables.
V – Startup Timeout Tuning
The most common MCP deployment failure: startup timeout.
MCP clients wait a fixed amount of time for the server to be ready. Claude Desktop's default is around 10 seconds. If your MCP server takes longer than that to start — due to a cold database connection, a network call during initialization, or slow compilation — the client times out and reports the server as failed.
The fix is simple but often missed: do nothing during startup that can fail or be slow.
func main() {
// Do not connect to databases here
// Do not make HTTP calls here
// Do not validate API keys here
// Just start the MCP server
server := mcp.NewServer(
mcp.WithTool("remember", handleRemember),
mcp.WithTool("recall", handleRecall),
)
server.Run() // blocks, reads from stdin
}
API key validation happens on the first tool call, not on startup. If the key is invalid, the first call returns an authentication error. The client handles it. The server didn't fail to start.
Database connections, if needed, use lazy initialization with connection pools. The first call opens connections. Subsequent calls reuse them.
Startup target: under 200ms. No network calls. No disk I/O beyond reading environment variables.
VI – Error Translation
Your API returns structured errors. The MCP protocol has its own error model. The MCP server translates between them.
func handleRecall(ctx context.Context, params RecallParams) (mcp.CallToolResult, error) {
results, err := apiClient.Memory.Recall(ctx, params.Query, params.Limit)
if err != nil {
var apiErr *APIError
if errors.As(err, &apiErr) {
// Translate API errors to MCP tool results with error content
return mcp.CallToolResult{
IsError: true,
Content: []mcp.Content{{
Type: "text",
Text: fmt.Sprintf("Failed to recall: %s (code: %s)", apiErr.Message, apiErr.Code),
}},
}, nil // Return nil error to MCP - the error is in the result content
}
// Unexpected error - propagate to MCP as a protocol error
return mcp.CallToolResult{}, fmt.Errorf("unexpected error: %w", err)
}
return formatRecallResults(results), nil
}
There are two error paths in MCP:
- Tool result with
isError: true: the tool ran but the operation failed. The LLM sees the error message and can respond to it ("I wasn't able to recall that information"). - Protocol-level error: the tool itself failed to run. The MCP client handles this at the infrastructure level.
API errors (404, 422, 403) should be tool results with isError: true. The LLM should know about them and can explain them to the user.
Unexpected errors (500, network failures) can be protocol errors. The LLM will report that the tool is unavailable.
VII – What Breaks First
Tool startup timeout/handshake failures. The server takes 12 seconds to start because it's doing a database health check. The client times out. The developer sees "MCP server failed to connect" with no useful diagnostic information. Fix: startup must be fast. Move all initialization that can fail into the first tool call.
Argument schema mismatch with backend contract. The MCP tool schema says importance accepts ["low", "medium", "high"]. Your API's actual enum is ["low", "normal", "high", "critical"]. The tool schema and API contract are out of sync. The LLM sends importance: "normal", the MCP server passes it through, the API accepts it, but the schema advertised to the LLM was wrong. Fix: generate MCP tool schemas from your API's OpenAPI spec. They should be the same source of truth.
Hidden credential exposure in process configs. A developer pastes their MCP server config into a GitHub issue for support help. The config includes the API key in plaintext. It's now in GitHub's issue database. Fix: document clearly that the API key should be set as a system-level environment variable, not in configuration files that might be shared. Provide a ctxvault configure CLI command that sets the env var without exposing the key in shell history.
MCP Tool Design Checklist
- Tool names are verbs that describe the action
- Descriptions specify when to use the tool, not just what it does
- Input schemas use
requiredarrays explicitly - Credentials come from environment, never from tool arguments
- Startup completes in < 200ms with no network calls
- API errors are translated to tool results with
isError: true - Tool schemas are generated from or validated against the API spec
- Server version is included in the initialization response
Timeout Recommendations
| Scenario | Recommended Timeout |
|---|---|
| Server startup | 200ms (target), 2s (hard limit) |
| Tool call (read operations) | 5s |
| Tool call (write operations) | 10s |
| API client connection timeout | 3s |
| API client read timeout | 8s |
Thin adapter. Stable API underneath. That's the architecture that ages well.
0 comments