Files
MCP_CyberArk/docs/HLD.md
2026-03-29 19:51:51 +02:00

30 KiB

High-Level Design

MCP Privileged Access Service

Version: 1.1 Date: 2026-03-28 Status: Production-ready


Table of Contents

  1. What is MCP? A Primer
  2. Purpose & Scope
  3. System Context
  4. Architecture Principles
  5. Component Overview
  6. Authentication & Authorization Model
  7. The Secret Handle Pattern
  8. Data Flow — Key Use Cases
  9. Deployment Architecture
  10. Technology Choices
  11. Security Architecture Summary
  12. Future Roadmap

0. What is MCP? A Primer

Learning note: This section explains the MCP concept from first principles before we dive into this specific service. Skip to Section 1 if you are already familiar with the protocol.

The core problem MCP solves

A large language model (LLM) like Claude is, at its heart, a text-in / text-out system. On its own it cannot do anything in the world — it can only describe what it would do. The challenge is: how do you give an AI assistant the ability to take real actions (read a file, query a database, run a command) in a controlled, auditable, and standardised way?

Before MCP, every team solved this differently. Some embedded shell calls directly in prompts; others built bespoke REST wrappers. There was no common contract between the AI and the tools it called.

MCP (Model Context Protocol) is Anthropic's open standard that defines exactly that contract.


The three primitives

MCP defines three building blocks that a server can expose to a model:

Primitive What it is Analogy
Tool A callable function the model can invoke An API endpoint / RPC call
Resource A piece of data the model can read A file or database row
Prompt A reusable prompt template A macro or named query

Learning note: This service uses only Tools. Tools are the most important primitive for agentic use cases — cases where the model takes actions, not just answers questions. Resources and Prompts are useful but less common in automation pipelines.


How a tool call works end-to-end

┌──────────────────────────────────────────────────────────────────────┐
│  USER   "Check disk usage on web01"                                  │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ user message
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  CLAUDE (the model)                                                  │
│  Reads the list of available tools (JSON Schema descriptions).       │
│  Decides: "I need ssh_execute to answer this."                       │
│  Emits a tool_use block in its response:                             │
│    { "name": "ssh_execute",                                          │
│      "input": { "host": "web01", "command": "df -h", ... } }        │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ tool_use request (JSON-RPC over HTTP/SSE)
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  MCP SERVER  (this service)                                          │
│  Receives the JSON-RPC call.                                         │
│  Executes the Python function ssh_execute(...).                      │
│  Returns a tool_result: { "content": "Filesystem  Size  Used…" }    │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ tool_result (text)
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  CLAUDE (the model)                                                  │
│  Incorporates the tool result into its context.                      │
│  Generates a final human-readable answer.                            │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ assistant message
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  USER   "web01: 18G used of 50G (36%)"                               │
└──────────────────────────────────────────────────────────────────────┘

Learning note: The model never executes code itself. It only emits a structured request saying "please call this tool with these arguments." The MCP server is the only thing that touches real infrastructure. This separation is fundamental to safety — you can audit, rate-limit, and authorise every action at the server layer without modifying the model.


Transport: SSE over HTTP

MCP uses Server-Sent Events (SSE) as its default transport. The client (Claude Code) opens a persistent HTTP connection to the server. The server streams JSON-RPC messages back as SSE events.

Learning note: Why SSE and not WebSockets? SSE is unidirectional (server → client) and works over plain HTTP/1.1 with no protocol upgrade. This makes it firewall-friendly and easy to put behind standard reverse proxies like nginx. The request direction (client → server) still uses normal HTTP POST.


FastMCP: the Python framework

Raw MCP requires implementing a JSON-RPC server, describing tools in JSON Schema, and handling SSE streams. FastMCP (Anthropic's Python library) removes all of that boilerplate:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("my-server")

@mcp.tool(description="Add two numbers")
async def add(a: int, b: int) -> int:
    return a + b

FastMCP introspects the Python type annotations and generates the JSON Schema automatically. The @mcp.tool() decorator registers the function — the function itself is just a normal async def.

Learning note: This is exactly how all four MCP servers in this service are built. The tool functions (get_credential, ssh_execute, ps_execute, db_query) are plain Python async functions. The MCP protocol wrapping is invisible to the implementation code. You can call them directly in unit tests without any MCP machinery at all — which is why the test suite can be so simple.


Context injection

FastMCP injects a Context object as the ctx parameter of every tool. You do not pass it yourself — the framework supplies it automatically when the tool is called over the MCP protocol.

async def ssh_execute(host: str, command: str, ctx: Context, ...) -> str:
    await ctx.info(f"Connecting to {host}")   # progress notification to the caller
    await ctx.error("Something went wrong")   # error notification

ctx.info() and ctx.error() send notifications back to the client during tool execution, before the final result is returned. This is how Claude Code shows "Connecting to web01..." in its status bar while a long-running command is in progress.

Learning note: In unit tests, ctx is a MagicMock. The tests assert on ctx.info.call_args and ctx.error.call_args to verify the right status messages were emitted — without any real MCP transport being involved.


Multi-server composition

A single Python process can host multiple independent MCP servers, each mounted at a different URL path on a shared FastAPI application:

FastAPI app
├── /mcp/cyberark   ← FastMCP("cyberark")  — get_credential, list_safes
├── /mcp/ssh        ← FastMCP("ssh")       — ssh_execute
├── /mcp/powershell ← FastMCP("powershell")— ps_execute
└── /mcp/database   ← FastMCP("database")  — db_query

Claude Code is configured with four separate MCP server entries, each pointing to one of these paths. From Claude's perspective they appear as four separate servers, but they share a single process, a single secret store, and a single audit log stream.

Learning note: Mounting multiple FastMCP instances on one FastAPI app via app.mount(path, mcp.sse_app()) is the standard pattern for building multi-capability MCP services. The alternative — one process per server — would require inter-process communication to share the secret store, which adds complexity with no security benefit.


1. Purpose & Scope

The MCP Privileged Access Service enables Claude (Anthropic's AI assistant) to execute privileged operations on enterprise infrastructure — Linux servers via SSH, Windows servers via PowerShell/WinRM, and databases — using credentials managed by CyberArk Privileged Access Management (PAM).

The fundamental security guarantee:

The AI model (Claude) never sees the actual password at any point in the workflow. Credentials are fetched from CyberArk, held in RAM behind an opaque token, and used directly for the target connection — all within the service boundary.

Scope includes:

  • Retrieving credentials from CyberArk Central Credential Provider (CCP)
  • Executing shell commands on Linux/Unix hosts via SSH
  • Executing PowerShell scripts on Windows hosts via WinRM
  • Running SQL queries on PostgreSQL, MySQL, and SQL Server databases
  • Structured audit logging of all privileged operations
  • API key authentication for Claude Code clients

Scope excludes:

  • User interface or dashboard
  • Credential rotation or lifecycle management (handled by CyberArk)
  • Session recording (handled by CyberArk PSM if required)
  • Multi-tenancy (single-tenant service per deployment)

2. System Context

┌──────────────────────────────────────────────────────────────────────┐
│  OPERATOR / SECURITY TEAM                                            │
│  • Provisions CyberArk safes & AppID                                 │
│  • Issues MCP API keys to Claude Code clients                        │
│  • Reviews structured audit logs                                     │
└──────────────────────┬───────────────────────────────────────────────┘
                       │ configure
                       ▼
┌──────────────────────────────────────────────────────────────────────┐
│  CLAUDE CODE (client)                                                │
│  Claude Desktop / VS Code / CLI                                      │
│  - Sends MCP tool calls over HTTPS with X-API-Key header             │
│  - Receives tool results (output, exit codes, query rows)            │
│  - NEVER receives actual passwords                                   │
└──────────────────────┬───────────────────────────────────────────────┘
                       │ HTTPS + API Key   (JSON-RPC / MCP protocol)
                       ▼
┌──────────────────────────────────────────────────────────────────────┐
│  MCP PRIVILEGED ACCESS SERVICE          (this system)               │
│  ┌─────────────┐  ┌──────┐  ┌──────────┐  ┌────────┐  ┌─────────┐ │
│  │  CyberArk   │  │ SSH  │  │PowerShell│  │Database│  │  Auth + │ │
│  │    MCP      │  │ MCP  │  │   MCP    │  │  MCP   │  │  Audit  │ │
│  └──────┬──────┘  └──┬───┘  └────┬─────┘  └───┬────┘  └─────────┘ │
│         │            │            │             │                    │
│         └────────────┴────────────┴─────────────┘                   │
│                         Secret Store (RAM)                           │
└───┬──────────────┬──────────────┬──────────────┬─────────────────────┘
    │              │              │              │
    ▼              ▼              ▼              ▼
┌───────┐    ┌──────────┐  ┌──────────┐  ┌──────────┐
│CyberArk    │Linux/Unix│  │ Windows  │  │PostgreSQL│
│  CCP  │    │  Hosts   │  │  Hosts   │  │ MySQL    │
│(HTTPS)│    │  (SSH)   │  │ (WinRM)  │  │SQL Server│
└───────┘    └──────────┘  └──────────┘  └──────────┘

3. Architecture Principles

P1 — Zero password exposure to the LLM

Passwords flow from CyberArk → RAM → target connection. At no stage does a password appear in an MCP tool response, log message, or error message. This is enforced in code, not by policy alone.

P2 — Short-lived, single-use credential handles

A credential fetched from CyberArk is wrapped in a cryptographically random handle (secret:// + 32-char hex). The handle:

  • Expires after a configurable TTL (default 5 minutes)
  • Is invalidated on first use (default HANDLE_SINGLE_USE=true)
  • Lives only in process RAM — never written to disk or network

P3 — Full audit trail

Every credential fetch, handle resolution, SSH execution, PowerShell execution, and database query is recorded in structured JSON logs. Passwords and output data are never included in audit events.

P4 — Defence in depth

Multiple independent security layers:

  1. Network: service only reachable from permitted IP ranges (firewall / VPC)
  2. Transport: HTTPS with valid TLS certificate
  3. Application: API key authentication on every request
  4. Credential: CyberArk AppID + IP allowlist (or mTLS)
  5. Handle: short TTL + single use
  6. Code: SecretStr wrapper prevents accidental password serialisation

P5 — Stateless compute, stateful secrets in RAM only

No database, no disk state. The secret store is an in-memory dict with an asyncio lock. Service restart invalidates all handles (safe failure mode — operators must re-fetch).

P6 — Explicit over implicit

Every configuration value is explicit (settings.*), every dependency is injected at startup (lifespan), and every module imports only what it needs. No global mutable state except the two intentional singletons (secret_store, cyberark_client).


4. Component Overview

4.1 Foundation Layer

Component File Role
Configuration config.py Single pydantic-settings model; reads from env / .env file
Secret Store secret_store.py In-RAM handle store with TTL, single-use, and background sweeper
Auth Middleware auth.py Starlette middleware; validates API key on all /mcp/* routes
Audit Logger audit.py Structured structlog events; one function per audit event type
Service Entry Point main.py FastAPI app assembly, lifespan wiring, MCP server mounting

4.2 MCP Servers

Server Mount Path Tool(s) Protocol Auth to target
CyberArk /mcp/cyberark get_credential, list_safes HTTPS REST IP allowlist / mTLS
SSH /mcp/ssh ssh_execute SSH (asyncssh) Password from handle
PowerShell /mcp/powershell ps_execute WinRM (pypsrp) Password from handle
Database /mcp/database db_query asyncpg / aiomysql / pyodbc Password from handle

Each MCP server is an independent FastMCP instance mounted as a sub-application on the shared FastAPI app. They share only two objects: secret_store (to resolve handles) and settings (configuration).


5. Authentication & Authorization Model

Client → Service (inbound)

Claude Code client
       │
       │  HTTP request to /mcp/<server>/...
       │  Header: X-API-Key: <key>
       │    OR
       │  Header: Authorization: Bearer <key>
       │
       ▼
  ApiKeyMiddleware
       │
       ├── Path starts with /mcp/ ?
       │     NO  → pass through (health check, etc.)
       │     YES → validate key against settings.mcp_api_keys
       │               INVALID → 401 + audit log
       │               VALID   → continue to MCP handler

Multiple API keys are supported (comma-separated MCP_API_KEYS). Keys can be rotated by removing old keys and adding new ones, with no restart required if using a future key-reload mechanism.

Service → CyberArk (outbound)

Mode 1: IP Allowlist (current default)

  • The service makes HTTPS GET requests to the CCP REST API
  • CyberArk trusts the caller based on source IP
  • The AppID (CYBERARK_APP_ID) identifies the application in CyberArk policy

Mode 2: mTLS (future)

  • A PFX certificate file is loaded at startup
  • The TLS client certificate is attached to every CCP request
  • CyberArk validates the certificate in addition to (or instead of) IP

Service → Target Systems (outbound)

Target Auth method Credentials from
SSH hosts Password or key Secret handle → asyncssh.connect(password=...)
WinRM hosts NTLM / Basic Secret handle → WSMan(password=...)
Databases Native DB auth Secret handle → driver connect call

6. The Secret Handle Pattern

This is the central security innovation of the service. It solves the problem: How does an AI model invoke privileged operations without ever knowing the password?

Step 1 — Credential fetch
──────────────────────────
Claude calls:  get_credential(safe="PROD-LINUX", object_name="svc_root")

Service:
  1. Calls CyberArk CCP REST API
  2. Receives { "UserName": "root", "Content": "P@ssword123", ... }
  3. Calls secret_store.store("root", "P@ssword123")
     → stores in RAM as _Entry with a random 32-char hex handle_id
     → returns handle = "secret://a3f9c2e1b8d7..."
  4. Returns to Claude: "Handle: secret://a3f9c2e1... TTL: 300s"
     PASSWORD IS NEVER IN THIS RETURN VALUE

Step 2 — Privileged operation
──────────────────────────────
Claude calls:  ssh_execute(host="server01", command="df -h",
                           secret_handle="secret://a3f9c2e1b8d7...")

Service:
  1. Calls secret_store.resolve("secret://a3f9c2e1b8d7...")
     → checks TTL and single-use flag
     → if valid: returns ("root", "P@ssword123") and deletes handle
  2. Calls asyncssh.connect("server01", username="root", password="P@ssword123")
  3. Runs command, collects output
  4. Deletes password variable (del password)
  5. Returns: "Exit code: 0\nstdout:\n/dev/sda1  50G  10G  40G  20% /"
     PASSWORD IS NEVER IN THIS RETURN VALUE

Step 3 — Handle is gone
────────────────────────
If Claude tries to reuse the same handle:
  secret_store.resolve(...) raises KeyError("Handle already consumed")
  → Claude must call get_credential again for the next operation

Handle lifecycle state machine:

              store()
CREATED ────────────────► ACTIVE
                            │
                   ┌────────┴────────┐
                   │                 │
               resolve()          TTL expired
           (single_use=True)    (sweeper task)
                   │                 │
                   ▼                 ▼
               CONSUMED           EXPIRED
               (deleted)          (deleted)

7. Data Flow — Key Use Cases

7.1 SSH Command Execution

Claude          CyberArk MCP    SecretStore     SSH MCP        Linux Host
  │                 │               │               │               │
  │ get_credential  │               │               │               │
  │────────────────►│               │               │               │
  │                 │ GET CCP REST  │               │               │
  │                 │──────────────────────────────────────────────►│(CyberArk)
  │                 │◄─────────────────────────────────────────────-│
  │                 │ store(user,pw)│               │               │
  │                 │──────────────►│               │               │
  │                 │◄── handle ────│               │               │
  │◄── handle ──────│               │               │               │
  │                 │               │               │               │
  │ ssh_execute(handle)             │               │               │
  │─────────────────────────────────────────────────►              │
  │                 │               │ resolve(handle│               │
  │                 │               │◄──────────────│               │
  │                 │               │──(user,pw)───►│               │
  │                 │               │               │ SSH connect   │
  │                 │               │               │──────────────►│
  │                 │               │               │◄── output ───-│
  │                 │               │               │ del password  │
  │◄─────────────────────────────────────────────────output────────│

7.2 Database Query

Identical flow to SSH, substituting db_query for ssh_execute and the target database driver for asyncssh.

7.3 PowerShell Execution

The WinRM flow differs in one aspect: pypsrp is synchronous, so the call is offloaded to a thread-pool executor while the asyncio event loop continues serving other requests.

  asyncio event loop                   Thread pool executor
  ──────────────────                   ────────────────────
  resolve handle
  await run_in_executor(None, _run_ps_sync, ...) ──────► _run_ps_sync()
  [event loop free to handle other requests]              WSMan()
                                                          RunspacePool()
                                                          ps.invoke()
  ◄────────────────────────────────────────────── return output
  del password
  return result

8. Deployment Architecture

  ┌──────────────────────────────────────────────┐
  │  Hardened VM (e.g., Ubuntu 22.04)            │
  │                                              │
  │  ┌─────────────────────────────────────┐     │
  │  │  Docker container                   │     │
  │  │  Image: mcp-privileged:1.0          │     │
  │  │  User: mcpuser (non-root)           │     │
  │  │  Port: 8443 (internal)              │     │
  │  └──────────────┬──────────────────────┘     │
  │                 │                             │
  │  ┌──────────────▼──────────────────────┐     │
  │  │  Reverse proxy (nginx / Caddy)      │     │
  │  │  TLS termination                    │     │
  │  │  Port: 443 (external)               │     │
  │  └──────────────────────────────────────     │
  └──────────────────────────────────────────────┘
          │
          │ Firewall: only Claude Code source IPs allowed

Network segmentation requirements

Connection Inbound to Source Port
Claude Code → Service Service host Claude Code client IPs 443 (HTTPS)
Service → CyberArk CCP CyberArk Service host IP 443 (HTTPS)
Service → SSH targets Linux hosts Service host IP 22 (or custom)
Service → WinRM targets Windows hosts Service host IP 5985/5986
Service → Databases DB servers Service host IP 5432/3306/1433

Health check

GET /health returns {"status": "ok"} with no authentication. Suitable for load balancer and container health probes.


9. Technology Choices

Technology Choice Rationale
Web framework FastAPI Async-native, excellent OpenAPI support, Starlette middleware
MCP framework FastMCP (mcp[server]) Official Python MCP SDK; Streamable HTTP transport
HTTP client httpx Async, connection pooling, easy mock transport for tests
SSH asyncssh Pure-Python async SSH2; no subprocess dependency
WinRM pypsrp Python PowerShell Remoting Protocol; most complete WinRM library
PostgreSQL asyncpg Fastest async Postgres driver; native protocol
MySQL aiomysql Async MySQL driver
SQL Server pyodbc Standard ODBC; requires Microsoft ODBC Driver 18 on host
Config pydantic-settings Type-safe config; reads from env + .env; validates at startup
Logging structlog Structured JSON output; easy log shipping; context vars
Crypto cryptography PFX parsing for mTLS; well-maintained
Runtime Python 3.11 asyncio improvements, tomllib, ExceptionGroup, slots dataclasses
Container Docker (multi-stage) Small runtime image; non-root user; no build tools in production

10. Security Architecture Summary

Control Implementation Protects Against
TLS in transit HTTPS everywhere (CCP, service) Eavesdropping, MITM
API key auth ApiKeyMiddleware on all /mcp/* Unauthorised tool calls
CyberArk AppID Registered in CyberArk policy Unauthorised credential access
IP allowlist (CyberArk) CyberArk trusted-net config Rogue callers to CCP
mTLS (future) PFX cert on CCP requests Stronger caller identity
Secret handle Opaque token, not password Password exposure to LLM
Single-use handle handle_single_use=True Credential replay
TTL on handle Default 300s Handle leakage window
RAM-only storage SecretStore dict, no disk I/O Credential at-rest exposure
SecretStr wrapper Pydantic SecretStr Accidental log/repr of password
del password Explicit deletion after use Password in heap dumps
Audit log (no password) structlog, explicit field list Credential in log files
Non-root container USER mcpuser in Dockerfile Container escape impact
Output limits 50 KB per stream, 1000 DB rows Context flooding / DoS

11. Future Roadmap

Item Priority Description
mTLS for CyberArk High Config already present; needs PFX cert provisioning
API key rotation without restart Medium Watch env file or use a config reload endpoint
SSH key-based auth Medium Support asyncssh with private key from CyberArk
Kerberos/NTLM for WinRM Medium Currently NTLM; Kerberos for domain environments
Connection pooling (SSH) Low Reuse SSH connections for repeated calls to same host
Multi-tenant API keys Low Map API keys to CyberArk AppIDs for key-per-team isolation
Metrics endpoint Low Prometheus /metrics for connection counts, handle stats
Session recording integration Low Forward SSH output to CyberArk PSM or a SIEM