30 KiB
High-Level Design
MCP Privileged Access Service
Version: 1.1 Date: 2026-03-28 Status: Production-ready
Table of Contents
- What is MCP? A Primer
- Purpose & Scope
- System Context
- Architecture Principles
- Component Overview
- Authentication & Authorization Model
- The Secret Handle Pattern
- Data Flow — Key Use Cases
- Deployment Architecture
- Technology Choices
- Security Architecture Summary
- Future Roadmap
0. What is MCP? A Primer
Learning note: This section explains the MCP concept from first principles before we dive into this specific service. Skip to Section 1 if you are already familiar with the protocol.
The core problem MCP solves
A large language model (LLM) like Claude is, at its heart, a text-in / text-out system. On its own it cannot do anything in the world — it can only describe what it would do. The challenge is: how do you give an AI assistant the ability to take real actions (read a file, query a database, run a command) in a controlled, auditable, and standardised way?
Before MCP, every team solved this differently. Some embedded shell calls directly in prompts; others built bespoke REST wrappers. There was no common contract between the AI and the tools it called.
MCP (Model Context Protocol) is Anthropic's open standard that defines exactly that contract.
The three primitives
MCP defines three building blocks that a server can expose to a model:
| Primitive | What it is | Analogy |
|---|---|---|
| Tool | A callable function the model can invoke | An API endpoint / RPC call |
| Resource | A piece of data the model can read | A file or database row |
| Prompt | A reusable prompt template | A macro or named query |
Learning note: This service uses only Tools. Tools are the most important primitive for agentic use cases — cases where the model takes actions, not just answers questions. Resources and Prompts are useful but less common in automation pipelines.
How a tool call works end-to-end
┌──────────────────────────────────────────────────────────────────────┐
│ USER "Check disk usage on web01" │
└────────────────────────────┬─────────────────────────────────────────┘
│ user message
▼
┌──────────────────────────────────────────────────────────────────────┐
│ CLAUDE (the model) │
│ Reads the list of available tools (JSON Schema descriptions). │
│ Decides: "I need ssh_execute to answer this." │
│ Emits a tool_use block in its response: │
│ { "name": "ssh_execute", │
│ "input": { "host": "web01", "command": "df -h", ... } } │
└────────────────────────────┬─────────────────────────────────────────┘
│ tool_use request (JSON-RPC over HTTP/SSE)
▼
┌──────────────────────────────────────────────────────────────────────┐
│ MCP SERVER (this service) │
│ Receives the JSON-RPC call. │
│ Executes the Python function ssh_execute(...). │
│ Returns a tool_result: { "content": "Filesystem Size Used…" } │
└────────────────────────────┬─────────────────────────────────────────┘
│ tool_result (text)
▼
┌──────────────────────────────────────────────────────────────────────┐
│ CLAUDE (the model) │
│ Incorporates the tool result into its context. │
│ Generates a final human-readable answer. │
└────────────────────────────┬─────────────────────────────────────────┘
│ assistant message
▼
┌──────────────────────────────────────────────────────────────────────┐
│ USER "web01: 18G used of 50G (36%)" │
└──────────────────────────────────────────────────────────────────────┘
Learning note: The model never executes code itself. It only emits a structured request saying "please call this tool with these arguments." The MCP server is the only thing that touches real infrastructure. This separation is fundamental to safety — you can audit, rate-limit, and authorise every action at the server layer without modifying the model.
Transport: SSE over HTTP
MCP uses Server-Sent Events (SSE) as its default transport. The client (Claude Code) opens a persistent HTTP connection to the server. The server streams JSON-RPC messages back as SSE events.
Learning note: Why SSE and not WebSockets? SSE is unidirectional (server → client) and works over plain HTTP/1.1 with no protocol upgrade. This makes it firewall-friendly and easy to put behind standard reverse proxies like nginx. The request direction (client → server) still uses normal HTTP POST.
FastMCP: the Python framework
Raw MCP requires implementing a JSON-RPC server, describing tools in JSON Schema, and handling SSE streams. FastMCP (Anthropic's Python library) removes all of that boilerplate:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("my-server")
@mcp.tool(description="Add two numbers")
async def add(a: int, b: int) -> int:
return a + b
FastMCP introspects the Python type annotations and generates the JSON Schema automatically. The @mcp.tool() decorator registers the function — the function itself is just a normal async def.
Learning note: This is exactly how all four MCP servers in this service are built. The tool functions (
get_credential,ssh_execute,ps_execute,db_query) are plain Python async functions. The MCP protocol wrapping is invisible to the implementation code. You can call them directly in unit tests without any MCP machinery at all — which is why the test suite can be so simple.
Context injection
FastMCP injects a Context object as the ctx parameter of every tool. You do not pass it yourself — the framework supplies it automatically when the tool is called over the MCP protocol.
async def ssh_execute(host: str, command: str, ctx: Context, ...) -> str:
await ctx.info(f"Connecting to {host}") # progress notification to the caller
await ctx.error("Something went wrong") # error notification
ctx.info() and ctx.error() send notifications back to the client during tool execution, before the final result is returned. This is how Claude Code shows "Connecting to web01..." in its status bar while a long-running command is in progress.
Learning note: In unit tests,
ctxis aMagicMock. The tests assert onctx.info.call_argsandctx.error.call_argsto verify the right status messages were emitted — without any real MCP transport being involved.
Multi-server composition
A single Python process can host multiple independent MCP servers, each mounted at a different URL path on a shared FastAPI application:
FastAPI app
├── /mcp/cyberark ← FastMCP("cyberark") — get_credential, list_safes
├── /mcp/ssh ← FastMCP("ssh") — ssh_execute
├── /mcp/powershell ← FastMCP("powershell")— ps_execute
└── /mcp/database ← FastMCP("database") — db_query
Claude Code is configured with four separate MCP server entries, each pointing to one of these paths. From Claude's perspective they appear as four separate servers, but they share a single process, a single secret store, and a single audit log stream.
Learning note: Mounting multiple FastMCP instances on one FastAPI app via
app.mount(path, mcp.sse_app())is the standard pattern for building multi-capability MCP services. The alternative — one process per server — would require inter-process communication to share the secret store, which adds complexity with no security benefit.
1. Purpose & Scope
The MCP Privileged Access Service enables Claude (Anthropic's AI assistant) to execute privileged operations on enterprise infrastructure — Linux servers via SSH, Windows servers via PowerShell/WinRM, and databases — using credentials managed by CyberArk Privileged Access Management (PAM).
The fundamental security guarantee:
The AI model (Claude) never sees the actual password at any point in the workflow. Credentials are fetched from CyberArk, held in RAM behind an opaque token, and used directly for the target connection — all within the service boundary.
Scope includes:
- Retrieving credentials from CyberArk Central Credential Provider (CCP)
- Executing shell commands on Linux/Unix hosts via SSH
- Executing PowerShell scripts on Windows hosts via WinRM
- Running SQL queries on PostgreSQL, MySQL, and SQL Server databases
- Structured audit logging of all privileged operations
- API key authentication for Claude Code clients
Scope excludes:
- User interface or dashboard
- Credential rotation or lifecycle management (handled by CyberArk)
- Session recording (handled by CyberArk PSM if required)
- Multi-tenancy (single-tenant service per deployment)
2. System Context
┌──────────────────────────────────────────────────────────────────────┐
│ OPERATOR / SECURITY TEAM │
│ • Provisions CyberArk safes & AppID │
│ • Issues MCP API keys to Claude Code clients │
│ • Reviews structured audit logs │
└──────────────────────┬───────────────────────────────────────────────┘
│ configure
▼
┌──────────────────────────────────────────────────────────────────────┐
│ CLAUDE CODE (client) │
│ Claude Desktop / VS Code / CLI │
│ - Sends MCP tool calls over HTTPS with X-API-Key header │
│ - Receives tool results (output, exit codes, query rows) │
│ - NEVER receives actual passwords │
└──────────────────────┬───────────────────────────────────────────────┘
│ HTTPS + API Key (JSON-RPC / MCP protocol)
▼
┌──────────────────────────────────────────────────────────────────────┐
│ MCP PRIVILEGED ACCESS SERVICE (this system) │
│ ┌─────────────┐ ┌──────┐ ┌──────────┐ ┌────────┐ ┌─────────┐ │
│ │ CyberArk │ │ SSH │ │PowerShell│ │Database│ │ Auth + │ │
│ │ MCP │ │ MCP │ │ MCP │ │ MCP │ │ Audit │ │
│ └──────┬──────┘ └──┬───┘ └────┬─────┘ └───┬────┘ └─────────┘ │
│ │ │ │ │ │
│ └────────────┴────────────┴─────────────┘ │
│ Secret Store (RAM) │
└───┬──────────────┬──────────────┬──────────────┬─────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌───────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│CyberArk │Linux/Unix│ │ Windows │ │PostgreSQL│
│ CCP │ │ Hosts │ │ Hosts │ │ MySQL │
│(HTTPS)│ │ (SSH) │ │ (WinRM) │ │SQL Server│
└───────┘ └──────────┘ └──────────┘ └──────────┘
3. Architecture Principles
P1 — Zero password exposure to the LLM
Passwords flow from CyberArk → RAM → target connection. At no stage does a password appear in an MCP tool response, log message, or error message. This is enforced in code, not by policy alone.
P2 — Short-lived, single-use credential handles
A credential fetched from CyberArk is wrapped in a cryptographically random handle (secret:// + 32-char hex). The handle:
- Expires after a configurable TTL (default 5 minutes)
- Is invalidated on first use (default
HANDLE_SINGLE_USE=true) - Lives only in process RAM — never written to disk or network
P3 — Full audit trail
Every credential fetch, handle resolution, SSH execution, PowerShell execution, and database query is recorded in structured JSON logs. Passwords and output data are never included in audit events.
P4 — Defence in depth
Multiple independent security layers:
- Network: service only reachable from permitted IP ranges (firewall / VPC)
- Transport: HTTPS with valid TLS certificate
- Application: API key authentication on every request
- Credential: CyberArk AppID + IP allowlist (or mTLS)
- Handle: short TTL + single use
- Code:
SecretStrwrapper prevents accidental password serialisation
P5 — Stateless compute, stateful secrets in RAM only
No database, no disk state. The secret store is an in-memory dict with an asyncio lock. Service restart invalidates all handles (safe failure mode — operators must re-fetch).
P6 — Explicit over implicit
Every configuration value is explicit (settings.*), every dependency is injected at startup (lifespan), and every module imports only what it needs. No global mutable state except the two intentional singletons (secret_store, cyberark_client).
4. Component Overview
4.1 Foundation Layer
| Component | File | Role |
|---|---|---|
| Configuration | config.py |
Single pydantic-settings model; reads from env / .env file |
| Secret Store | secret_store.py |
In-RAM handle store with TTL, single-use, and background sweeper |
| Auth Middleware | auth.py |
Starlette middleware; validates API key on all /mcp/* routes |
| Audit Logger | audit.py |
Structured structlog events; one function per audit event type |
| Service Entry Point | main.py |
FastAPI app assembly, lifespan wiring, MCP server mounting |
4.2 MCP Servers
| Server | Mount Path | Tool(s) | Protocol | Auth to target |
|---|---|---|---|---|
| CyberArk | /mcp/cyberark |
get_credential, list_safes |
HTTPS REST | IP allowlist / mTLS |
| SSH | /mcp/ssh |
ssh_execute |
SSH (asyncssh) | Password from handle |
| PowerShell | /mcp/powershell |
ps_execute |
WinRM (pypsrp) | Password from handle |
| Database | /mcp/database |
db_query |
asyncpg / aiomysql / pyodbc | Password from handle |
Each MCP server is an independent FastMCP instance mounted as a sub-application on the shared FastAPI app. They share only two objects: secret_store (to resolve handles) and settings (configuration).
5. Authentication & Authorization Model
Client → Service (inbound)
Claude Code client
│
│ HTTP request to /mcp/<server>/...
│ Header: X-API-Key: <key>
│ OR
│ Header: Authorization: Bearer <key>
│
▼
ApiKeyMiddleware
│
├── Path starts with /mcp/ ?
│ NO → pass through (health check, etc.)
│ YES → validate key against settings.mcp_api_keys
│ INVALID → 401 + audit log
│ VALID → continue to MCP handler
Multiple API keys are supported (comma-separated MCP_API_KEYS). Keys can be rotated by removing old keys and adding new ones, with no restart required if using a future key-reload mechanism.
Service → CyberArk (outbound)
Mode 1: IP Allowlist (current default)
- The service makes HTTPS GET requests to the CCP REST API
- CyberArk trusts the caller based on source IP
- The AppID (
CYBERARK_APP_ID) identifies the application in CyberArk policy
Mode 2: mTLS (future)
- A PFX certificate file is loaded at startup
- The TLS client certificate is attached to every CCP request
- CyberArk validates the certificate in addition to (or instead of) IP
Service → Target Systems (outbound)
| Target | Auth method | Credentials from |
|---|---|---|
| SSH hosts | Password or key | Secret handle → asyncssh.connect(password=...) |
| WinRM hosts | NTLM / Basic | Secret handle → WSMan(password=...) |
| Databases | Native DB auth | Secret handle → driver connect call |
6. The Secret Handle Pattern
This is the central security innovation of the service. It solves the problem: How does an AI model invoke privileged operations without ever knowing the password?
Step 1 — Credential fetch
──────────────────────────
Claude calls: get_credential(safe="PROD-LINUX", object_name="svc_root")
Service:
1. Calls CyberArk CCP REST API
2. Receives { "UserName": "root", "Content": "P@ssword123", ... }
3. Calls secret_store.store("root", "P@ssword123")
→ stores in RAM as _Entry with a random 32-char hex handle_id
→ returns handle = "secret://a3f9c2e1b8d7..."
4. Returns to Claude: "Handle: secret://a3f9c2e1... TTL: 300s"
PASSWORD IS NEVER IN THIS RETURN VALUE
Step 2 — Privileged operation
──────────────────────────────
Claude calls: ssh_execute(host="server01", command="df -h",
secret_handle="secret://a3f9c2e1b8d7...")
Service:
1. Calls secret_store.resolve("secret://a3f9c2e1b8d7...")
→ checks TTL and single-use flag
→ if valid: returns ("root", "P@ssword123") and deletes handle
2. Calls asyncssh.connect("server01", username="root", password="P@ssword123")
3. Runs command, collects output
4. Deletes password variable (del password)
5. Returns: "Exit code: 0\nstdout:\n/dev/sda1 50G 10G 40G 20% /"
PASSWORD IS NEVER IN THIS RETURN VALUE
Step 3 — Handle is gone
────────────────────────
If Claude tries to reuse the same handle:
secret_store.resolve(...) raises KeyError("Handle already consumed")
→ Claude must call get_credential again for the next operation
Handle lifecycle state machine:
store()
CREATED ────────────────► ACTIVE
│
┌────────┴────────┐
│ │
resolve() TTL expired
(single_use=True) (sweeper task)
│ │
▼ ▼
CONSUMED EXPIRED
(deleted) (deleted)
7. Data Flow — Key Use Cases
7.1 SSH Command Execution
Claude CyberArk MCP SecretStore SSH MCP Linux Host
│ │ │ │ │
│ get_credential │ │ │ │
│────────────────►│ │ │ │
│ │ GET CCP REST │ │ │
│ │──────────────────────────────────────────────►│(CyberArk)
│ │◄─────────────────────────────────────────────-│
│ │ store(user,pw)│ │ │
│ │──────────────►│ │ │
│ │◄── handle ────│ │ │
│◄── handle ──────│ │ │ │
│ │ │ │ │
│ ssh_execute(handle) │ │ │
│─────────────────────────────────────────────────► │
│ │ │ resolve(handle│ │
│ │ │◄──────────────│ │
│ │ │──(user,pw)───►│ │
│ │ │ │ SSH connect │
│ │ │ │──────────────►│
│ │ │ │◄── output ───-│
│ │ │ │ del password │
│◄─────────────────────────────────────────────────output────────│
7.2 Database Query
Identical flow to SSH, substituting db_query for ssh_execute and the target database driver for asyncssh.
7.3 PowerShell Execution
The WinRM flow differs in one aspect: pypsrp is synchronous, so the call is offloaded to a thread-pool executor while the asyncio event loop continues serving other requests.
asyncio event loop Thread pool executor
────────────────── ────────────────────
resolve handle
await run_in_executor(None, _run_ps_sync, ...) ──────► _run_ps_sync()
[event loop free to handle other requests] WSMan()
RunspacePool()
ps.invoke()
◄────────────────────────────────────────────── return output
del password
return result
8. Deployment Architecture
Recommended (Docker on a hardened VM)
┌──────────────────────────────────────────────┐
│ Hardened VM (e.g., Ubuntu 22.04) │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Docker container │ │
│ │ Image: mcp-privileged:1.0 │ │
│ │ User: mcpuser (non-root) │ │
│ │ Port: 8443 (internal) │ │
│ └──────────────┬──────────────────────┘ │
│ │ │
│ ┌──────────────▼──────────────────────┐ │
│ │ Reverse proxy (nginx / Caddy) │ │
│ │ TLS termination │ │
│ │ Port: 443 (external) │ │
│ └────────────────────────────────────── │
└──────────────────────────────────────────────┘
│
│ Firewall: only Claude Code source IPs allowed
Network segmentation requirements
| Connection | Inbound to | Source | Port |
|---|---|---|---|
| Claude Code → Service | Service host | Claude Code client IPs | 443 (HTTPS) |
| Service → CyberArk CCP | CyberArk | Service host IP | 443 (HTTPS) |
| Service → SSH targets | Linux hosts | Service host IP | 22 (or custom) |
| Service → WinRM targets | Windows hosts | Service host IP | 5985/5986 |
| Service → Databases | DB servers | Service host IP | 5432/3306/1433 |
Health check
GET /health returns {"status": "ok"} with no authentication. Suitable for load balancer and container health probes.
9. Technology Choices
| Technology | Choice | Rationale |
|---|---|---|
| Web framework | FastAPI | Async-native, excellent OpenAPI support, Starlette middleware |
| MCP framework | FastMCP (mcp[server]) | Official Python MCP SDK; Streamable HTTP transport |
| HTTP client | httpx | Async, connection pooling, easy mock transport for tests |
| SSH | asyncssh | Pure-Python async SSH2; no subprocess dependency |
| WinRM | pypsrp | Python PowerShell Remoting Protocol; most complete WinRM library |
| PostgreSQL | asyncpg | Fastest async Postgres driver; native protocol |
| MySQL | aiomysql | Async MySQL driver |
| SQL Server | pyodbc | Standard ODBC; requires Microsoft ODBC Driver 18 on host |
| Config | pydantic-settings | Type-safe config; reads from env + .env; validates at startup |
| Logging | structlog | Structured JSON output; easy log shipping; context vars |
| Crypto | cryptography | PFX parsing for mTLS; well-maintained |
| Runtime | Python 3.11 | asyncio improvements, tomllib, ExceptionGroup, slots dataclasses |
| Container | Docker (multi-stage) | Small runtime image; non-root user; no build tools in production |
10. Security Architecture Summary
| Control | Implementation | Protects Against |
|---|---|---|
| TLS in transit | HTTPS everywhere (CCP, service) | Eavesdropping, MITM |
| API key auth | ApiKeyMiddleware on all /mcp/* |
Unauthorised tool calls |
| CyberArk AppID | Registered in CyberArk policy | Unauthorised credential access |
| IP allowlist (CyberArk) | CyberArk trusted-net config | Rogue callers to CCP |
| mTLS (future) | PFX cert on CCP requests | Stronger caller identity |
| Secret handle | Opaque token, not password | Password exposure to LLM |
| Single-use handle | handle_single_use=True |
Credential replay |
| TTL on handle | Default 300s | Handle leakage window |
| RAM-only storage | SecretStore dict, no disk I/O |
Credential at-rest exposure |
SecretStr wrapper |
Pydantic SecretStr |
Accidental log/repr of password |
del password |
Explicit deletion after use | Password in heap dumps |
| Audit log (no password) | structlog, explicit field list | Credential in log files |
| Non-root container | USER mcpuser in Dockerfile |
Container escape impact |
| Output limits | 50 KB per stream, 1000 DB rows | Context flooding / DoS |
11. Future Roadmap
| Item | Priority | Description |
|---|---|---|
| mTLS for CyberArk | High | Config already present; needs PFX cert provisioning |
| API key rotation without restart | Medium | Watch env file or use a config reload endpoint |
| SSH key-based auth | Medium | Support asyncssh with private key from CyberArk |
| Kerberos/NTLM for WinRM | Medium | Currently NTLM; Kerberos for domain environments |
| Connection pooling (SSH) | Low | Reuse SSH connections for repeated calls to same host |
| Multi-tenant API keys | Low | Map API keys to CyberArk AppIDs for key-per-team isolation |
| Metrics endpoint | Low | Prometheus /metrics for connection counts, handle stats |
| Session recording integration | Low | Forward SSH output to CyberArk PSM or a SIEM |