Most examples make MCP feel like a separate service.
You start with FastMCP, run it on its own port, then wire another deployment around it. For a demo, that is fine. For a real API, I do not want another process just because I added tools for an AI client.
The word I kept coming back to was surface area.
Every separate MCP server adds more surface area: another port, another health check, another auth path, another thing to remember when production is already noisy. If the API already exists, I would rather put MCP inside the API than build a second small backend next to it.
The extra service is the trap
This is the common starting point:
from fastmcp import FastMCP
mcp = FastMCP("My Tools")
@mcp.tool
def get_status() -> str:
return "All systems operational"
mcp.run(transport="sse")
Nothing is wrong with this by itself. The problem shows up when you already have a FastAPI app.
Now MCP wants a port. Your API already has a port. MCP wants auth. Your API already has auth. MCP wants logs, CORS, monitoring, lifecycle, deployment config. Your API already has all of that too.
That duplication is the part I try to avoid.
Mount MCP into the app you already run
FastAPI already gives you ASGI mount. That means the MCP app can live under the same process as the rest of your API:
from fastapi import FastAPI
from fastmcp import FastMCP
mcp = FastMCP("API Tools")
@mcp.tool
def get_status() -> str:
return "All systems operational"
# Create the MCP HTTP app
mcp_app = mcp.http_app(path="/")
# FastAPI shares MCP's lifecycle — no wrapper needed
app = FastAPI(lifespan=mcp_app.lifespan)
# Your API routes still work
@app.get("/health")
def health():
return {"status": "ok"}
# Mount MCP alongside them
app.mount("/mcp", mcp_app)
Then run the FastAPI app like usual:
uvicorn main:app --host 0.0.0.0 --port 8000
This is the version I prefer: one process, one port, one deployment.
Your API still answers /health. MCP answers under /mcp. The mental model is simple: MCP is not a sibling service. It is a mounted sub-application.
Streaming does not change the shape
The first thing I wondered was whether mounting would make streaming awkward. It does not.
http_app() defaults to request-response. If the tool takes a while, the AI client waits for the full response. For anything long-running, switch the transport to streamable HTTP:
# Request-response (default):
mcp_app = mcp.http_app(path="/")
# Streamable HTTP — chunks flow as they arrive:
mcp_app = mcp.http_app(path="/", transport="streamable-http")
The mount stays the same. The uvicorn command stays the same. The only difference is how the MCP HTTP app talks to the client.
FastMCP’s http_app() supports "http", "streamable-http", and "sse" through the transport parameter. I choose that based on the tool behavior, not based on how I deploy the server.
Keep auth boring
This is the main reason I like mounting.
Auth is already one of the most annoying places to accidentally create a second system. If FastAPI middleware validates your requests, the mounted MCP routes go through the same outer app. That means CORS, logging, rate limits, and request checks can stay where they already are.
For MCP-specific credentials, I usually think about it in two buckets.
Use JWTs when the client has a signed token
FastMCP provides JWTVerifier when the AI client sends a bearer token:
import os
from fastmcp import FastMCP
from fastmcp.server.auth.providers.jwt import JWTVerifier
auth = JWTVerifier(
jwks_uri=os.environ["JWKS_URI"],
issuer=os.environ["JWT_ISSUER"],
audience=os.environ["JWT_AUDIENCE"],
)
mcp = FastMCP("API Tools", auth=auth)
mcp_app = mcp.http_app(path="/", transport="streamable-http")
app.mount("/mcp", mcp_app)
The client sends Authorization: Bearer <jwt>. FastMCP validates the token before any tool runs.
This is a good fit when your MCP client is already part of an identity flow with issuer, audience, and JWKS.
Use headers when your app already owns the key
If users already have API keys in your database, I would rather reuse that than invent an MCP-only token model.
Inside a tool, read the request headers with get_http_headers():
from fastmcp.server.dependencies import get_http_headers
def list_projects() -> list[dict]:
"""List projects for the authenticated API key owner."""
headers = get_http_headers(include_all=True)
user = authenticate_api_key(headers.get("x-api-key"))
with get_db() as db:
return [serialize(p) for p in project_services.list_all(db, user)]
mcp.tool(list_projects)
That keeps the tool close to the rest of the application model: same users, same database, same API keys.
get_http_headers() works whether the request came through ChatGPT, Claude, or your own test client. If no HTTP request is active, it returns an empty dict safely, which makes local testing less awkward.
Separate tool sets without separate services
Once the mount pattern is in place, splitting tools by domain becomes cheap.
For example, I can mount different MCP servers under different paths:
crm_mcp = FastMCP("CRM Tools")
slack_mcp = FastMCP("Slack Tools")
app.mount("/mcp/crm", crm_mcp.http_app(path="/", transport="streamable-http"))
app.mount("/mcp/slack", slack_mcp.http_app(path="/", transport="streamable-http"))
The AI client connects to the server it needs. The FastAPI app still owns the process.
Without mounting, this pattern turns into multiple small services very quickly. With mounting, it stays as routing.
The file split I usually want
I do not like keeping all of this in main.py once the tool list grows. The app shell and the tool registration are different concerns.
This is the shape I usually reach for:
# api/main.py — the app shell: middleware, mounts, health checks
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from api.mcp_server import create_mcp_http_app
from api.routers import router as api_router
mcp_app = create_mcp_http_app()
app = FastAPI(lifespan=mcp_app.lifespan)
app.add_middleware(CORSMiddleware, allow_origins=["*"])
app.include_router(api_router, prefix="/api/v1")
app.mount("/mcp", mcp_app)
# api/mcp_server.py — tools only, no deployment concerns
from fastmcp import FastMCP
def create_mcp_server() -> FastMCP:
mcp = FastMCP("My API Tools")
mcp.tool(list_projects)
mcp.tool(create_project)
mcp.tool(search_documents)
mcp.tool(upload_assets)
# ... add tools as your API grows
return mcp
def create_mcp_http_app():
return create_mcp_server().http_app(
path="/",
transport="streamable-http",
)
The important part is that main.py only decides where the MCP app lives. The MCP module decides what tools exist.
There are a few details in this split that matter:
lifespan=mcp_app.lifespan— one lifecycle. Database pools and background tasks start and stop together. No dangling connections.- Factory function —
create_mcp_server()builds the server.main.pyonly mounts it. Tools are testable in isolation. http_app(path="/")— the MCP app’s internal path is relative. Mount at/mcpand the streamable endpoint resolves to/mcp/. Mount at/toolsand it becomes/tools/.- Middleware before mount — CORS, logging, rate limiting applied once at the FastAPI level. MCP inherits everything.
Test the mounted endpoint
After the server starts, I check the mounted endpoint directly:
curl -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
If the app expects an API key, include it:
curl -X POST http://localhost:8000/mcp/ \
-H "x-api-key: your-key" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
For Claude Desktop, the config points at the deployed route:
{
"mcpServers": {
"my-tools": {
"type": "http",
"url": "https://your-domain.com/mcp/",
"headers": { "x-api-key": "your-key" }
}
}
}
That is the whole point of the pattern.
The AI tools become another interface to the same application. They do not need a parallel deployment story, a parallel auth story, or a parallel monitoring story.
Mounting keeps the surface area small.
MCP infrastructure that doesn’t multiply your ops burden. I build FastAPI + FastMCP integrations that mount into your existing stack — no new services, no new headaches. See how.