lutz/MCP_CyberArk

Fork 0

Files

Lutz Finsterle fd49e28d05 Initial commit

2026-03-29 19:51:51 +02:00

30 KiB

Raw Blame History

High-Level Design

MCP Privileged Access Service

Version: 1.1 Date: 2026-03-28 Status: Production-ready

What is MCP? A Primer
Purpose & Scope
System Context
Architecture Principles
Component Overview
Authentication & Authorization Model
The Secret Handle Pattern
Data Flow — Key Use Cases
Deployment Architecture
Technology Choices
Security Architecture Summary
Future Roadmap

0. What is MCP? A Primer

Learning note: This section explains the MCP concept from first principles before we dive into this specific service. Skip to Section 1 if you are already familiar with the protocol.

The core problem MCP solves

A large language model (LLM) like Claude is, at its heart, a text-in / text-out system. On its own it cannot do anything in the world — it can only describe what it would do. The challenge is: how do you give an AI assistant the ability to take real actions (read a file, query a database, run a command) in a controlled, auditable, and standardised way?

Before MCP, every team solved this differently. Some embedded shell calls directly in prompts; others built bespoke REST wrappers. There was no common contract between the AI and the tools it called.

MCP (Model Context Protocol) is Anthropic's open standard that defines exactly that contract.

The three primitives

MCP defines three building blocks that a server can expose to a model:

Primitive	What it is	Analogy
Tool	A callable function the model can invoke	An API endpoint / RPC call
Resource	A piece of data the model can read	A file or database row
Prompt	A reusable prompt template	A macro or named query

Learning note: This service uses only Tools. Tools are the most important primitive for agentic use cases — cases where the model takes actions, not just answers questions. Resources and Prompts are useful but less common in automation pipelines.

How a tool call works end-to-end

┌──────────────────────────────────────────────────────────────────────┐
│  USER   "Check disk usage on web01"                                  │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ user message
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  CLAUDE (the model)                                                  │
│  Reads the list of available tools (JSON Schema descriptions).       │
│  Decides: "I need ssh_execute to answer this."                       │
│  Emits a tool_use block in its response:                             │
│    { "name": "ssh_execute",                                          │
│      "input": { "host": "web01", "command": "df -h", ... } }        │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ tool_use request (JSON-RPC over HTTP/SSE)
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  MCP SERVER  (this service)                                          │
│  Receives the JSON-RPC call.                                         │
│  Executes the Python function ssh_execute(...).                      │
│  Returns a tool_result: { "content": "Filesystem  Size  Used…" }    │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ tool_result (text)
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  CLAUDE (the model)                                                  │
│  Incorporates the tool result into its context.                      │
│  Generates a final human-readable answer.                            │
└────────────────────────────┬─────────────────────────────────────────┘
                             │ assistant message
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│  USER   "web01: 18G used of 50G (36%)"                               │
└──────────────────────────────────────────────────────────────────────┘

Learning note: The model never executes code itself. It only emits a structured request saying "please call this tool with these arguments." The MCP server is the only thing that touches real infrastructure. This separation is fundamental to safety — you can audit, rate-limit, and authorise every action at the server layer without modifying the model.

Transport: SSE over HTTP

MCP uses Server-Sent Events (SSE) as its default transport. The client (Claude Code) opens a persistent HTTP connection to the server. The server streams JSON-RPC messages back as SSE events.

Learning note: Why SSE and not WebSockets? SSE is unidirectional (server → client) and works over plain HTTP/1.1 with no protocol upgrade. This makes it firewall-friendly and easy to put behind standard reverse proxies like nginx. The request direction (client → server) still uses normal HTTP POST.

FastMCP: the Python framework

Raw MCP requires implementing a JSON-RPC server, describing tools in JSON Schema, and handling SSE streams. FastMCP (Anthropic's Python library) removes all of that boilerplate:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("my-server")

@mcp.tool(description="Add two numbers")
async def add(a: int, b: int) -> int:
    return a + b

FastMCP introspects the Python type annotations and generates the JSON Schema automatically. The @mcp.tool() decorator registers the function — the function itself is just a normal async def.

Learning note: This is exactly how all four MCP servers in this service are built. The tool functions (get_credential, ssh_execute, ps_execute, db_query) are plain Python async functions. The MCP protocol wrapping is invisible to the implementation code. You can call them directly in unit tests without any MCP machinery at all — which is why the test suite can be so simple.

Context injection

FastMCP injects a Context object as the ctx parameter of every tool. You do not pass it yourself — the framework supplies it automatically when the tool is called over the MCP protocol.

async def ssh_execute(host: str, command: str, ctx: Context, ...) -> str:
    await ctx.info(f"Connecting to {host}")   # progress notification to the caller
    await ctx.error("Something went wrong")   # error notification

ctx.info() and ctx.error() send notifications back to the client during tool execution, before the final result is returned. This is how Claude Code shows "Connecting to web01..." in its status bar while a long-running command is in progress.

Learning note: In unit tests, ctx is a MagicMock. The tests assert on ctx.info.call_args and ctx.error.call_args to verify the right status messages were emitted — without any real MCP transport being involved.

Multi-server composition

A single Python process can host multiple independent MCP servers, each mounted at a different URL path on a shared FastAPI application:

FastAPI app
├── /mcp/cyberark   ← FastMCP("cyberark")  — get_credential, list_safes
├── /mcp/ssh        ← FastMCP("ssh")       — ssh_execute
├── /mcp/powershell ← FastMCP("powershell")— ps_execute
└── /mcp/database   ← FastMCP("database")  — db_query

Claude Code is configured with four separate MCP server entries, each pointing to one of these paths. From Claude's perspective they appear as four separate servers, but they share a single process, a single secret store, and a single audit log stream.

Learning note: Mounting multiple FastMCP instances on one FastAPI app via app.mount(path, mcp.sse_app()) is the standard pattern for building multi-capability MCP services. The alternative — one process per server — would require inter-process communication to share the secret store, which adds complexity with no security benefit.

1. Purpose & Scope

The MCP Privileged Access Service enables Claude (Anthropic's AI assistant) to execute privileged operations on enterprise infrastructure — Linux servers via SSH, Windows servers via PowerShell/WinRM, and databases — using credentials managed by CyberArk Privileged Access Management (PAM).

The fundamental security guarantee:

The AI model (Claude) never sees the actual password at any point in the workflow. Credentials are fetched from CyberArk, held in RAM behind an opaque token, and used directly for the target connection — all within the service boundary.

Scope includes:

Retrieving credentials from CyberArk Central Credential Provider (CCP)
Executing shell commands on Linux/Unix hosts via SSH
Executing PowerShell scripts on Windows hosts via WinRM
Running SQL queries on PostgreSQL, MySQL, and SQL Server databases
Structured audit logging of all privileged operations
API key authentication for Claude Code clients

Scope excludes:

User interface or dashboard
Credential rotation or lifecycle management (handled by CyberArk)
Session recording (handled by CyberArk PSM if required)
Multi-tenancy (single-tenant service per deployment)

2. System Context

┌──────────────────────────────────────────────────────────────────────┐
│  OPERATOR / SECURITY TEAM                                            │
│  • Provisions CyberArk safes & AppID                                 │
│  • Issues MCP API keys to Claude Code clients                        │
│  • Reviews structured audit logs                                     │
└──────────────────────┬───────────────────────────────────────────────┘
                       │ configure
                       ▼
┌──────────────────────────────────────────────────────────────────────┐
│  CLAUDE CODE (client)                                                │
│  Claude Desktop / VS Code / CLI                                      │
│  - Sends MCP tool calls over HTTPS with X-API-Key header             │
│  - Receives tool results (output, exit codes, query rows)            │
│  - NEVER receives actual passwords                                   │
└──────────────────────┬───────────────────────────────────────────────┘
                       │ HTTPS + API Key   (JSON-RPC / MCP protocol)
                       ▼
┌──────────────────────────────────────────────────────────────────────┐
│  MCP PRIVILEGED ACCESS SERVICE          (this system)               │
│  ┌─────────────┐  ┌──────┐  ┌──────────┐  ┌────────┐  ┌─────────┐ │
│  │  CyberArk   │  │ SSH  │  │PowerShell│  │Database│  │  Auth + │ │
│  │    MCP      │  │ MCP  │  │   MCP    │  │  MCP   │  │  Audit  │ │
│  └──────┬──────┘  └──┬───┘  └────┬─────┘  └───┬────┘  └─────────┘ │
│         │            │            │             │                    │
│         └────────────┴────────────┴─────────────┘                   │
│                         Secret Store (RAM)                           │
└───┬──────────────┬──────────────┬──────────────┬─────────────────────┘
    │              │              │              │
    ▼              ▼              ▼              ▼
┌───────┐    ┌──────────┐  ┌──────────┐  ┌──────────┐
│CyberArk    │Linux/Unix│  │ Windows  │  │PostgreSQL│
│  CCP  │    │  Hosts   │  │  Hosts   │  │ MySQL    │
│(HTTPS)│    │  (SSH)   │  │ (WinRM)  │  │SQL Server│
└───────┘    └──────────┘  └──────────┘  └──────────┘

3. Architecture Principles

P1 — Zero password exposure to the LLM

Passwords flow from CyberArk → RAM → target connection. At no stage does a password appear in an MCP tool response, log message, or error message. This is enforced in code, not by policy alone.

P2 — Short-lived, single-use credential handles

A credential fetched from CyberArk is wrapped in a cryptographically random handle (secret:// + 32-char hex). The handle:

Expires after a configurable TTL (default 5 minutes)
Is invalidated on first use (default HANDLE_SINGLE_USE=true)
Lives only in process RAM — never written to disk or network

P3 — Full audit trail

Every credential fetch, handle resolution, SSH execution, PowerShell execution, and database query is recorded in structured JSON logs. Passwords and output data are never included in audit events.

P4 — Defence in depth

Multiple independent security layers:

Network: service only reachable from permitted IP ranges (firewall / VPC)
Transport: HTTPS with valid TLS certificate
Application: API key authentication on every request
Credential: CyberArk AppID + IP allowlist (or mTLS)
Handle: short TTL + single use
Code: SecretStr wrapper prevents accidental password serialisation

P5 — Stateless compute, stateful secrets in RAM only

No database, no disk state. The secret store is an in-memory dict with an asyncio lock. Service restart invalidates all handles (safe failure mode — operators must re-fetch).

P6 — Explicit over implicit

Every configuration value is explicit (settings.*), every dependency is injected at startup (lifespan), and every module imports only what it needs. No global mutable state except the two intentional singletons (secret_store, cyberark_client).

4. Component Overview

4.1 Foundation Layer

Component	File	Role
Configuration	`config.py`	Single pydantic-settings model; reads from env / `.env` file
Secret Store	`secret_store.py`	In-RAM handle store with TTL, single-use, and background sweeper
Auth Middleware	`auth.py`	Starlette middleware; validates API key on all `/mcp/*` routes
Audit Logger	`audit.py`	Structured structlog events; one function per audit event type
Service Entry Point	`main.py`	FastAPI app assembly, lifespan wiring, MCP server mounting

4.2 MCP Servers

Server	Mount Path	Tool(s)	Protocol	Auth to target
CyberArk	`/mcp/cyberark`	`get_credential`, `list_safes`	HTTPS REST	IP allowlist / mTLS
SSH	`/mcp/ssh`	`ssh_execute`	SSH (asyncssh)	Password from handle
PowerShell	`/mcp/powershell`	`ps_execute`	WinRM (pypsrp)	Password from handle
Database	`/mcp/database`	`db_query`	asyncpg / aiomysql / pyodbc	Password from handle

Each MCP server is an independent FastMCP instance mounted as a sub-application on the shared FastAPI app. They share only two objects: secret_store (to resolve handles) and settings (configuration).

5. Authentication & Authorization Model

Client → Service (inbound)

Claude Code client
       │
       │  HTTP request to /mcp/<server>/...
       │  Header: X-API-Key: <key>
       │    OR
       │  Header: Authorization: Bearer <key>
       │
       ▼
  ApiKeyMiddleware
       │
       ├── Path starts with /mcp/ ?
       │     NO  → pass through (health check, etc.)
       │     YES → validate key against settings.mcp_api_keys
       │               INVALID → 401 + audit log
       │               VALID   → continue to MCP handler

Multiple API keys are supported (comma-separated MCP_API_KEYS). Keys can be rotated by removing old keys and adding new ones, with no restart required if using a future key-reload mechanism.

Service → CyberArk (outbound)

Mode 1: IP Allowlist (current default)

The service makes HTTPS GET requests to the CCP REST API
CyberArk trusts the caller based on source IP
The AppID (CYBERARK_APP_ID) identifies the application in CyberArk policy

Mode 2: mTLS (future)

A PFX certificate file is loaded at startup
The TLS client certificate is attached to every CCP request
CyberArk validates the certificate in addition to (or instead of) IP

Service → Target Systems (outbound)

Target	Auth method	Credentials from
SSH hosts	Password or key	Secret handle → `asyncssh.connect(password=...)`
WinRM hosts	NTLM / Basic	Secret handle → `WSMan(password=...)`
Databases	Native DB auth	Secret handle → driver connect call

6. The Secret Handle Pattern

This is the central security innovation of the service. It solves the problem: How does an AI model invoke privileged operations without ever knowing the password?

Step 1 — Credential fetch
──────────────────────────
Claude calls:  get_credential(safe="PROD-LINUX", object_name="svc_root")

Service:
  1. Calls CyberArk CCP REST API
  2. Receives { "UserName": "root", "Content": "P@ssword123", ... }
  3. Calls secret_store.store("root", "P@ssword123")
     → stores in RAM as _Entry with a random 32-char hex handle_id
     → returns handle = "secret://a3f9c2e1b8d7..."
  4. Returns to Claude: "Handle: secret://a3f9c2e1... TTL: 300s"
     PASSWORD IS NEVER IN THIS RETURN VALUE

Step 2 — Privileged operation
──────────────────────────────
Claude calls:  ssh_execute(host="server01", command="df -h",
                           secret_handle="secret://a3f9c2e1b8d7...")

Service:
  1. Calls secret_store.resolve("secret://a3f9c2e1b8d7...")
     → checks TTL and single-use flag
     → if valid: returns ("root", "P@ssword123") and deletes handle
  2. Calls asyncssh.connect("server01", username="root", password="P@ssword123")
  3. Runs command, collects output
  4. Deletes password variable (del password)
  5. Returns: "Exit code: 0\nstdout:\n/dev/sda1  50G  10G  40G  20% /"
     PASSWORD IS NEVER IN THIS RETURN VALUE

Step 3 — Handle is gone
────────────────────────
If Claude tries to reuse the same handle:
  secret_store.resolve(...) raises KeyError("Handle already consumed")
  → Claude must call get_credential again for the next operation

Handle lifecycle state machine:

              store()
CREATED ────────────────► ACTIVE
                            │
                   ┌────────┴────────┐
                   │                 │
               resolve()          TTL expired
           (single_use=True)    (sweeper task)
                   │                 │
                   ▼                 ▼
               CONSUMED           EXPIRED
               (deleted)          (deleted)

7. Data Flow — Key Use Cases

7.1 SSH Command Execution

Claude          CyberArk MCP    SecretStore     SSH MCP        Linux Host
  │                 │               │               │               │
  │ get_credential  │               │               │               │
  │────────────────►│               │               │               │
  │                 │ GET CCP REST  │               │               │
  │                 │──────────────────────────────────────────────►│(CyberArk)
  │                 │◄─────────────────────────────────────────────-│
  │                 │ store(user,pw)│               │               │
  │                 │──────────────►│               │               │
  │                 │◄── handle ────│               │               │
  │◄── handle ──────│               │               │               │
  │                 │               │               │               │
  │ ssh_execute(handle)             │               │               │
  │─────────────────────────────────────────────────►              │
  │                 │               │ resolve(handle│               │
  │                 │               │◄──────────────│               │
  │                 │               │──(user,pw)───►│               │
  │                 │               │               │ SSH connect   │
  │                 │               │               │──────────────►│
  │                 │               │               │◄── output ───-│
  │                 │               │               │ del password  │
  │◄─────────────────────────────────────────────────output────────│

7.2 Database Query

Identical flow to SSH, substituting db_query for ssh_execute and the target database driver for asyncssh.

7.3 PowerShell Execution

The WinRM flow differs in one aspect: pypsrp is synchronous, so the call is offloaded to a thread-pool executor while the asyncio event loop continues serving other requests.

  asyncio event loop                   Thread pool executor
  ──────────────────                   ────────────────────
  resolve handle
  await run_in_executor(None, _run_ps_sync, ...) ──────► _run_ps_sync()
  [event loop free to handle other requests]              WSMan()
                                                          RunspacePool()
                                                          ps.invoke()
  ◄────────────────────────────────────────────── return output
  del password
  return result

8. Deployment Architecture

Recommended (Docker on a hardened VM)

  ┌──────────────────────────────────────────────┐
  │  Hardened VM (e.g., Ubuntu 22.04)            │
  │                                              │
  │  ┌─────────────────────────────────────┐     │
  │  │  Docker container                   │     │
  │  │  Image: mcp-privileged:1.0          │     │
  │  │  User: mcpuser (non-root)           │     │
  │  │  Port: 8443 (internal)              │     │
  │  └──────────────┬──────────────────────┘     │
  │                 │                             │
  │  ┌──────────────▼──────────────────────┐     │
  │  │  Reverse proxy (nginx / Caddy)      │     │
  │  │  TLS termination                    │     │
  │  │  Port: 443 (external)               │     │
  │  └──────────────────────────────────────     │
  └──────────────────────────────────────────────┘
          │
          │ Firewall: only Claude Code source IPs allowed

Network segmentation requirements

Connection	Inbound to	Source	Port
Claude Code → Service	Service host	Claude Code client IPs	443 (HTTPS)
Service → CyberArk CCP	CyberArk	Service host IP	443 (HTTPS)
Service → SSH targets	Linux hosts	Service host IP	22 (or custom)
Service → WinRM targets	Windows hosts	Service host IP	5985/5986
Service → Databases	DB servers	Service host IP	5432/3306/1433

Health check

GET /health returns {"status": "ok"} with no authentication. Suitable for load balancer and container health probes.

9. Technology Choices

Technology	Choice	Rationale
Web framework	FastAPI	Async-native, excellent OpenAPI support, Starlette middleware
MCP framework	FastMCP (mcp[server])	Official Python MCP SDK; Streamable HTTP transport
HTTP client	httpx	Async, connection pooling, easy mock transport for tests
SSH	asyncssh	Pure-Python async SSH2; no subprocess dependency
WinRM	pypsrp	Python PowerShell Remoting Protocol; most complete WinRM library
PostgreSQL	asyncpg	Fastest async Postgres driver; native protocol
MySQL	aiomysql	Async MySQL driver
SQL Server	pyodbc	Standard ODBC; requires Microsoft ODBC Driver 18 on host
Config	pydantic-settings	Type-safe config; reads from env + `.env`; validates at startup
Logging	structlog	Structured JSON output; easy log shipping; context vars
Crypto	cryptography	PFX parsing for mTLS; well-maintained
Runtime	Python 3.11	asyncio improvements, `tomllib`, `ExceptionGroup`, slots dataclasses
Container	Docker (multi-stage)	Small runtime image; non-root user; no build tools in production

10. Security Architecture Summary

Control	Implementation	Protects Against
TLS in transit	HTTPS everywhere (CCP, service)	Eavesdropping, MITM
API key auth	`ApiKeyMiddleware` on all `/mcp/*`	Unauthorised tool calls
CyberArk AppID	Registered in CyberArk policy	Unauthorised credential access
IP allowlist (CyberArk)	CyberArk trusted-net config	Rogue callers to CCP
mTLS (future)	PFX cert on CCP requests	Stronger caller identity
Secret handle	Opaque token, not password	Password exposure to LLM
Single-use handle	`handle_single_use=True`	Credential replay
TTL on handle	Default 300s	Handle leakage window
RAM-only storage	`SecretStore` dict, no disk I/O	Credential at-rest exposure
`SecretStr` wrapper	Pydantic `SecretStr`	Accidental log/repr of password
`del password`	Explicit deletion after use	Password in heap dumps
Audit log (no password)	structlog, explicit field list	Credential in log files
Non-root container	`USER mcpuser` in Dockerfile	Container escape impact
Output limits	50 KB per stream, 1000 DB rows	Context flooding / DoS

11. Future Roadmap

Item	Priority	Description
mTLS for CyberArk	High	Config already present; needs PFX cert provisioning
API key rotation without restart	Medium	Watch env file or use a config reload endpoint
SSH key-based auth	Medium	Support `asyncssh` with private key from CyberArk
Kerberos/NTLM for WinRM	Medium	Currently NTLM; Kerberos for domain environments
Connection pooling (SSH)	Low	Reuse SSH connections for repeated calls to same host
Multi-tenant API keys	Low	Map API keys to CyberArk AppIDs for key-per-team isolation
Metrics endpoint	Low	Prometheus `/metrics` for connection counts, handle stats
Session recording integration	Low	Forward SSH output to CyberArk PSM or a SIEM

30 KiB Raw Blame History