Files
MCP_CyberArk/docs/MANUAL.md
2026-03-29 19:51:51 +02:00

26 KiB

Operations Manual

MCP Privileged Access Service

Version: 1.0 Date: 2026-03-28 Audience: System administrators, security engineers, DevOps teams


Table of Contents

  1. Prerequisites
  2. CyberArk Prerequisites
  3. Installation — Bare Metal / VM
  4. Installation — Docker
  5. Configuration Walkthrough
  6. SSH Host Key Setup
  7. Windows WinRM Setup
  8. SQL Server ODBC Driver Setup
  9. Claude Code Integration
  10. Usage Examples
  11. Monitoring & Log Events
  12. Troubleshooting Guide
  13. Security Hardening Checklist
  14. Backup & Recovery
  15. Upgrade Procedure

1. Prerequisites

System requirements

Component Minimum Recommended
OS Ubuntu 22.04 / RHEL 9 Ubuntu 22.04 LTS
CPU 1 vCPU 2 vCPU
RAM 512 MB 1 GB
Disk 2 GB 5 GB (for logs)
Python 3.11 3.11
Network See firewall rules below

Network access required (outbound from service host)

Destination Port Protocol Purpose
CyberArk CCP 443 HTTPS Credential retrieval
Linux target hosts 22 SSH ssh_execute tool
Windows target hosts 5985 or 5986 HTTP/HTTPS ps_execute tool (WinRM)
PostgreSQL servers 5432 TCP db_query (postgres)
MySQL servers 3306 TCP db_query (mysql)
SQL Server 1433 TCP db_query (mssql)

Network access required (inbound to service host)

Source Port Protocol Purpose
Claude Code clients 443 HTTPS MCP tool calls
Load balancer / monitoring 8443 HTTP Health check (if no TLS termination)

2. CyberArk Prerequisites

Before deploying the service, complete the following in CyberArk.

2.1 Create an Application ID

  1. In PVWA, navigate to ApplicationsAdd Application
  2. Set the application name to MCP-Privileged-Service (or your chosen value)
  3. Under Authentication, add the service host's IP address to the Allowed Machines list
  4. Save

2.2 Grant access to Safes

For each Safe containing credentials the service needs to retrieve:

  1. Navigate to the Safe → MembersAdd Member
  2. Add MCP-Privileged-Service (the Application ID)
  3. Grant permissions: Retrieve accounts (minimum)
  4. Do NOT grant: Add, Update, Delete, Manage — principle of least privilege

2.3 Verify CCP is reachable

From the service host:

curl -k "https://cyberark.internal/AIMWebService/api/Accounts?AppID=MCP-Privileged-Service&Safe=TEST&Object=TEST-obj"

Expected responses:

  • HTTP 200 — credential returned (safe and object exist)
  • HTTP 404 APPAP007E — AppID valid but object not found (CCP is reachable and trusted)
  • HTTP 403 APPAP006E — IP not in allowlist (add the service host IP to CyberArk)
  • Connection refused — CCP URL is wrong or firewall is blocking

2.4 (Future) mTLS — Export client certificate

  1. In CyberArk, generate or import a client certificate for the AppID
  2. Export as PFX with a strong password
  3. Copy the PFX file to the service host at a path like /app/certs/mcp.pfx
  4. Set chmod 400 /app/certs/mcp.pfx
  5. Set CYBERARK_CERT_PFX_PATH=/app/certs/mcp.pfx and CYBERARK_CERT_PFX_PASSWORD=<password> in .env

3. Installation — Bare Metal / VM

3.1 System packages

sudo apt-get update
sudo apt-get install -y python3.11 python3.11-venv python3.11-dev \
    unixodbc unixodbc-dev ca-certificates

For SQL Server support (optional):

# Add Microsoft repository
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list \
    | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18

3.2 Create service user

sudo useradd --system --no-create-home --shell /usr/sbin/nologin mcpuser
sudo mkdir -p /opt/mcp-privileged /opt/mcp-privileged/certs
sudo chown mcpuser:mcpuser /opt/mcp-privileged

3.3 Install the package

cd /opt/mcp-privileged

# Create and activate virtualenv
python3.11 -m venv .venv
source .venv/bin/activate

# Clone or copy source
# (assuming source is in /tmp/MCP_CyberArk)
pip install /tmp/MCP_CyberArk

# Verify
mcp-privileged --help

3.4 Configure

cp /tmp/MCP_CyberArk/.env.example /opt/mcp-privileged/.env
chmod 600 /opt/mcp-privileged/.env
nano /opt/mcp-privileged/.env
# Edit values — see Section 5

3.5 Configure SSH known_hosts

# Pre-populate known_hosts for all SSH target hosts:
sudo -u mcpuser ssh-keyscan linux01.internal linux02.internal >> \
    /home/mcpuser/.ssh/known_hosts 2>/dev/null
# Or set SSH_KNOWN_HOSTS=/etc/ssh/known_hosts and populate there

3.6 Create systemd service

sudo tee /etc/systemd/system/mcp-privileged.service > /dev/null <<'EOF'
[Unit]
Description=MCP Privileged Access Service
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=mcpuser
Group=mcpuser
WorkingDirectory=/opt/mcp-privileged
EnvironmentFile=/opt/mcp-privileged/.env
ExecStart=/opt/mcp-privileged/.venv/bin/mcp-privileged
Restart=on-failure
RestartSec=5s

# Security hardening
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ReadWritePaths=/opt/mcp-privileged
CapabilityBoundingSet=

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable mcp-privileged
sudo systemctl start mcp-privileged
sudo systemctl status mcp-privileged

3.7 Reverse proxy (nginx)

# /etc/nginx/sites-available/mcp-privileged
server {
    listen 443 ssl;
    server_name mcp.yourcompany.internal;

    ssl_certificate     /etc/ssl/certs/mcp.crt;
    ssl_certificate_key /etc/ssl/private/mcp.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Restrict to Claude Code client IPs (replace with real IPs)
    allow 10.0.0.0/24;
    deny  all;

    location / {
        proxy_pass         http://127.0.0.1:8443;
        proxy_set_header   X-Forwarded-For $remote_addr;
        proxy_set_header   Host $host;
        proxy_read_timeout 120s;
    }
}
sudo ln -s /etc/nginx/sites-available/mcp-privileged /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

4. Installation — Docker

4.1 Build the image

cd /path/to/MCP_CyberArk
docker build -t mcp-privileged:1.0 .

4.2 Create .env file

cp .env.example .env
chmod 600 .env
# Edit .env with your values

4.3 Run with Docker Compose

# Service only
docker compose up -d mcp-privileged

# Service + test databases (for integration testing)
docker compose --profile db up -d

4.4 Run with Docker (direct)

docker run -d \
  --name mcp-privileged \
  --restart unless-stopped \
  -p 8443:8443 \
  --env-file .env \
  -v "$(pwd)/certs:/app/certs:ro" \
  mcp-privileged:1.0

4.5 View logs

docker logs -f mcp-privileged

5. Configuration Walkthrough

Copy .env.example to .env and set each value:

Mandatory values

# API keys — comma-separated, no spaces around commas
# Generate with: python3 -c "import secrets; print(secrets.token_hex(32))"
MCP_API_KEYS=abc123def456...,xyz789uvw012...

# CyberArk CCP URL — the full REST endpoint
CYBERARK_CCP_URL=https://cyberark.yourcompany.internal/AIMWebService/api/Accounts

# AppID registered in CyberArk (must match exactly — case-sensitive)
CYBERARK_APP_ID=MCP-Privileged-Service

TLS verification

# Option 1: Use system CAs (default — works if CyberArk cert is signed by a trusted CA)
CYBERARK_VERIFY_SSL=true

# Option 2: Custom CA bundle (common for internal PKI)
CYBERARK_VERIFY_SSL=/etc/ssl/certs/internal-ca-bundle.crt

# Option 3: Disable (NEVER in production — dev/lab only)
CYBERARK_VERIFY_SSL=false

Handle security

# How long a handle stays valid (seconds). Shorter = more secure.
# Operations that take < 30s: keep at 120-300s
# Long-running database imports: consider up to 600s
HANDLE_TTL_SECONDS=300

# Single-use enforces that each get_credential call is for one operation only.
# Set to false only if Claude needs the same credential for multiple parallel calls.
HANDLE_SINGLE_USE=true

WinRM authentication

# ntlm  — works for domain accounts, most common
# basic — works for local accounts but requires HTTPS (use_ssl=true in the tool call)
WINRM_AUTH=ntlm

SSH known hosts

# Use the service user's known_hosts file (default)
SSH_KNOWN_HOSTS=~/.ssh/known_hosts

# Use a shared known_hosts for the whole service
SSH_KNOWN_HOSTS=/etc/mcp/ssh_known_hosts

# Disable host key checking (dev/lab ONLY — logs a warning on every connection)
SSH_KNOWN_HOSTS=disable

Logging

# Production: use json for log shipping to SIEM
LOG_FORMAT=json
LOG_LEVEL=INFO

# Development: use console for human-readable output
LOG_FORMAT=console
LOG_LEVEL=DEBUG

6. SSH Host Key Setup

The service verifies SSH host keys against a known_hosts file. New hosts must be added before Claude can connect.

Add a single host

# As the mcpuser (or root, then chown)
ssh-keyscan -H linux01.internal >> ~/.ssh/known_hosts

Add multiple hosts from a list

cat hosts.txt | xargs ssh-keyscan -H >> ~/.ssh/known_hosts

Where hosts.txt contains one hostname per line.

Using a shared known_hosts file

# Create shared file
sudo mkdir -p /etc/mcp
sudo ssh-keyscan -H linux01.internal linux02.internal db01.internal \
    > /etc/mcp/ssh_known_hosts
sudo chown mcpuser:mcpuser /etc/mcp/ssh_known_hosts
sudo chmod 440 /etc/mcp/ssh_known_hosts

Then set SSH_KNOWN_HOSTS=/etc/mcp/ssh_known_hosts in .env.

Verify a host key

ssh-keygen -F linux01.internal -f ~/.ssh/known_hosts

7. Windows WinRM Setup

7.1 Enable WinRM on Windows hosts

Run on each Windows target host (as Administrator):

# Enable WinRM with default settings (HTTP, port 5985)
Enable-PSRemoting -Force

# Allow connections from the MCP service host IP
Set-Item WSMan:\localhost\Service\Auth\Basic -Value $true
winrm set winrm/config/client/auth '@{Basic="true"}'

# Allow specific IP in firewall (replace 10.0.0.5 with your service host IP)
New-NetFirewallRule -Name "WinRM-MCP" -DisplayName "WinRM for MCP Service" `
    -Protocol TCP -LocalPort 5985 `
    -RemoteAddress 10.0.0.5 -Action Allow
# On the Windows host — create HTTPS listener with a certificate
# (assumes cert is in the Local Machine store)
$cert = Get-ChildItem Cert:\LocalMachine\My | Where-Object { $_.Subject -like "*win01*" }
New-WSManInstance winrm/config/Listener `
    -SelectorSet @{Transport="HTTPS"; Address="*"} `
    -ValueSet @{CertificateThumbprint=$cert.Thumbprint}

# Open HTTPS WinRM port in firewall
New-NetFirewallRule -Name "WinRM-HTTPS-MCP" `
    -Protocol TCP -LocalPort 5986 `
    -RemoteAddress 10.0.0.5 -Action Allow

Then use port=5986 and use_ssl=true in ps_execute tool calls.

7.3 Test WinRM from the service host

# Test HTTP WinRM connectivity (requires Python + pypsrp)
python3 -c "
from pypsrp.wsman import WSMan
from pypsrp.powershell import PowerShell, RunspacePool
wsman = WSMan('win01.internal', port=5985, username='domain\\\\svc_user',
              password='P@ssword', ssl=False, auth='ntlm')
with RunspacePool(wsman) as pool:
    ps = PowerShell(pool)
    ps.add_script('hostname')
    out = ps.invoke()
    print(out)
"

8. SQL Server ODBC Driver Setup

Required for db_query with db_type=mssql.

Ubuntu 22.04

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list \
    | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 unixodbc-dev

Verify ODBC driver

odbcinst -q -d -n "ODBC Driver 18 for SQL Server"
# Should print the driver configuration

Test SQL Server connectivity

python3 -c "
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};'
                      'SERVER=sql.internal,1433;DATABASE=master;'
                      'UID=sa;PWD=P@ssword;')
cur = conn.cursor()
cur.execute('SELECT @@VERSION')
print(cur.fetchone()[0])
"

9. Claude Code Integration

9.1 Configure MCP servers in Claude Code

Edit your Claude Code settings (usually ~/.claude/settings.json or via claude code config):

{
  "mcpServers": {
    "cyberark": {
      "type": "http",
      "url": "https://mcp.yourcompany.internal/mcp/cyberark",
      "headers": {
        "X-API-Key": "your-api-key-here"
      }
    },
    "ssh": {
      "type": "http",
      "url": "https://mcp.yourcompany.internal/mcp/ssh",
      "headers": {
        "X-API-Key": "your-api-key-here"
      }
    },
    "powershell": {
      "type": "http",
      "url": "https://mcp.yourcompany.internal/mcp/powershell",
      "headers": {
        "X-API-Key": "your-api-key-here"
      }
    },
    "database": {
      "type": "http",
      "url": "https://mcp.yourcompany.internal/mcp/database",
      "headers": {
        "X-API-Key": "your-api-key-here"
      }
    }
  }
}

9.2 Verify connectivity

In the Claude Code chat:

Check if the MCP servers are connected

Claude should report all four MCP servers (cyberark, ssh, powershell, database) as available tools.

9.3 Test with a simple operation

Using the PROD-LINUX safe, get the credential for svc_root on linux01.internal,
then run the command "whoami && uptime" on that host.

Claude should:

  1. Call get_credential(safe="PROD-LINUX", object_name="svc_root")
  2. Receive a handle
  3. Call ssh_execute(host="linux01.internal", command="whoami && uptime", secret_handle="secret://...")
  4. Return the output

10. Usage Examples

Example 1: Check disk space on a Linux server

User prompt to Claude:

Get the root credential from the PROD-LINUX safe (object name: linux-root),
then check disk usage on server01.internal.

What Claude does:

  1. get_credential(safe="PROD-LINUX", object_name="linux-root") → Returns: Handle: secret://abc123... Username: root Address: server01.internal

  2. ssh_execute(host="server01.internal", command="df -h", secret_handle="secret://abc123...") → Returns:

    Host: server01.internal
    Command: df -h
    Exit code: 0
    
    --- stdout ---
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sda1        50G   12G   38G  24% /
    /dev/sdb1       200G   80G  120G  40% /data
    

Example 2: Run a PowerShell script on Windows

User prompt to Claude:

Get the domain admin credential from WIN-SAFE (object: domain-admin),
then list all running services on win-server01.internal that are stopped.

What Claude does:

  1. get_credential(safe="WIN-SAFE", object_name="domain-admin")

  2. ps_execute(host="win-server01.internal", script="Get-Service | Where-Object {$_.Status -eq 'Stopped'} | Select-Object Name, DisplayName", secret_handle="secret://...") → Returns:

    Host: win-server01.internal
    Script length: 89 chars
    Had errors: False
    
    --- output ---
    Name            DisplayName
    ----            -----------
    wuauserv        Windows Update
    XblGameSave     Xbox Game Bar Saving Service
    

Example 3: Query a database

User prompt to Claude:

Get the db_reader credential from DB-SAFE (object: pg-reader),
then count the orders placed in the last 24 hours in the prod PostgreSQL database
on pg.internal, database name: orders.

What Claude does:

  1. get_credential(safe="DB-SAFE", object_name="pg-reader")

  2. db_query(host="pg.internal", database="orders", db_type="postgres", secret_handle="secret://...", query="SELECT COUNT(*) as orders_24h FROM orders WHERE created_at > NOW() - INTERVAL '24 hours'") → Returns:

    Host: pg.internal
    Database: orders (postgres)
    Query length: 84 chars
    Rows returned: 1
    Elapsed: 8ms
    
    orders_24h
    ----------
        1247
    

Example 4: Multi-step workflow

User prompt to Claude:

I need to patch the Apache web servers in the PROD-LINUX safe.
For each of web01, web02, and web03:
1. Get the svc_admin credential
2. Run "sudo apt-get install --only-upgrade apache2 -y" on each host
3. Then check "apache2 -v" to confirm the version

Note: Because HANDLE_SINGLE_USE=true, Claude must call get_credential once per server (the handle is consumed by the first ssh_execute).


11. Monitoring & Log Events

Log format (JSON)

{
  "event": "credential_fetched",
  "logger": "audit",
  "level": "info",
  "timestamp": "2026-03-28T10:30:00.123Z",
  "app_id": "MCP-Privileged-Service",
  "safe": "PROD-LINUX",
  "object_name": "linux-root",
  "handle_id": "a3f9c2e1b8d74f2c",
  "ttl_seconds": 300,
  "client_ip": "10.0.0.50"
}

Key events to alert on

Event Condition Suggested alert
auth_failure reason=invalid_or_missing_api_key Any single occurrence
auth_failure Rate > 5/minute from same IP Possible brute-force
cyberark_error error_code=APPAP006E CyberArk allowlist may be wrong
cyberark_error Rate > 10/hour Possible misconfiguration
handle_expired reason=already_consumed + high rate Handle replay attempt
ssh_executed exit_code != 0 Command failure — review
ps_executed had_errors=true Script error — review
Health check No response within 10s Service down

Log shipping

The service writes JSON logs to stdout. Use your standard log shipper:

Filebeat:

- type: container
  paths:
    - /var/lib/docker/containers/*/*.log
  processors:
    - decode_json_fields:
        fields: ["message"]
        target: ""

Splunk universal forwarder: Configure to tail the stdout log file or Docker container logs.

Grafana Loki + promtail:

scrape_configs:
  - job_name: mcp-privileged
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: [__meta_docker_container_name]
        regex: mcp-privileged
        action: keep

12. Troubleshooting Guide

Service fails to start

Symptom: systemctl status mcp-privileged shows failed or immediate exit.

Check 1: Configuration validation

cd /opt/mcp-privileged && source .venv/bin/activate
python3 -c "from mcp_privileged.config import settings; print('Config OK')"

If this fails, the error message shows which setting is invalid.

Check 2: PFX file (if mTLS is configured)

ls -la $CYBERARK_CERT_PFX_PATH
# Must exist and be readable by mcpuser

Check 3: Port in use

ss -tlnp | grep 8443

401 Unauthorized from Claude Code

Cause: API key mismatch between Claude Code settings and MCP_API_KEYS.

Verify:

# Check what keys are configured (value is obfuscated in logs)
grep MCP_API_KEYS /opt/mcp-privileged/.env

# Test with curl
curl -H "X-API-Key: your-key" https://mcp.yourcompany.internal/health
# Should return: {"status": "ok"}

CyberArk error APPAP006E (authentication failure)

Cause: The service host's IP is not in the CyberArk allowlist for the AppID.

Check: What IP does CyberArk see?

# From the service host, check your outbound IP
curl https://api.ipify.org
# Or check your internal NAT gateway

Fix: In PVWA → Applications → MCP-Privileged-Service → Allowed Machines → Add the IP.


CyberArk error APPAP007E (object not found)

Cause: The safe or object_name passed to get_credential does not exist in CyberArk.

Check:

  • Spelling and case of Safe name (CyberArk is case-sensitive)
  • Object name — this is the Account name (Name field), not the address or username
  • The AppID has Retrieve permission on the Safe

SSH connection fails: "Host key verification failed"

Cause: The target host's SSH fingerprint is not in the known_hosts file.

Fix:

ssh-keyscan -H linux01.internal >> ~/.ssh/known_hosts
# Or for the service user:
sudo -u mcpuser ssh-keyscan -H linux01.internal >> ~mcpuser/.ssh/known_hosts

Quick diagnostic (dev only): Temporarily set SSH_KNOWN_HOSTS=disable to confirm the issue is host key related, then fix properly.


SSH connection fails: "Permission denied"

Cause: Wrong username/password, or password auth is disabled on the target host.

Check:

  1. Verify the credential in CyberArk PVWA (test retrieval)
  2. Confirm the target host allows password authentication: PasswordAuthentication yes in /etc/ssh/sshd_config
  3. Confirm the account is not locked: passwd -S <username> on the target

WinRM connection fails

Symptom: ps_execute returns a WinRM connection error.

Check 1: WinRM is running on the target

# On the Windows host
Get-Service WinRM
winrm enumerate winrm/config/listener

Check 2: Firewall allows the connection

# On the Windows host — test if port is open
Test-NetConnection -ComputerName localhost -Port 5985

Check 3: Auth method matches

  • NTLM: works for domain accounts and most setups
  • Basic: requires WINRM_AUTH=basic in .env AND use_ssl=true in the tool call (Basic auth over HTTP is rejected by WinRM by default)

Database connection fails

PostgreSQL:

# Test from service host
psql -h pg.internal -U db_user -d mydb -c "SELECT 1"

MySQL:

mysql -h mysql.internal -u db_user -p -e "SELECT 1"

SQL Server (ODBC):

isql -v "DRIVER={ODBC Driver 18 for SQL Server};SERVER=sql.internal,1433;DATABASE=master" \
     sa "P@ssword"

If pyodbc fails with ImportError: libodbc.so.2: cannot open shared object file:

sudo apt-get install -y unixodbc

Handle expired / already consumed

Symptom: Tool returns KeyError: Handle expired or Handle already consumed.

Causes:

  • The TTL elapsed between get_credential and the tool call → increase HANDLE_TTL_SECONDS
  • HANDLE_SINGLE_USE=true and Claude tried to reuse the handle → normal behaviour; Claude should call get_credential again
  • Clock skew on the service host (TTL uses time.monotonic(), so clock skew does not affect it)

13. Security Hardening Checklist

Use this checklist before production deployment.

Network

  • Service host is in a restricted network segment (not accessible from general office network)
  • Firewall rules allow only approved Claude Code client IPs to reach port 443
  • Service host can only reach: CyberArk CCP, target SSH hosts, WinRM hosts, DB servers — no internet
  • Reverse proxy handles TLS termination with a valid internal CA certificate

Service configuration

  • MCP_API_KEYS is set to strong random keys (minimum 32 chars each)
  • Default key changeme is NOT present in MCP_API_KEYS
  • HANDLE_SINGLE_USE=true (default)
  • HANDLE_TTL_SECONDS ≤ 300 (5 minutes)
  • CYBERARK_VERIFY_SSL is not set to false
  • SSH_KNOWN_HOSTS is not set to disable
  • LOG_FORMAT=json (for log shipping)
  • .env file has chmod 600 and is owned by the service user

CyberArk

  • AppID has only Retrieve permission on Safes (no Add/Update/Delete)
  • IP allowlist is restricted to the service host IP only
  • A dedicated AppID is used for this service (not shared with other applications)

Docker

  • Container runs as non-root (USER mcpuser in Dockerfile — already done)
  • Secrets are passed via --env-file, not -e PASSWORD=... in docker run
  • Docker socket is not mounted into the container
  • Image is built from official Python base image (verified digest)

Operating system

  • OS is patched and on a supported LTS release
  • Service runs as a dedicated non-root user (mcpuser)
  • systemd unit has NoNewPrivileges=yes and ProtectSystem=strict
  • Log rotation is configured for stdout logs
  • auditd or similar is monitoring privileged operations

14. Backup & Recovery

The service is stateless: no persistent data is stored on disk.

  • Configuration: The only file that needs backing up is .env. Store it in your secrets management system (HashiCorp Vault, AWS Secrets Manager, etc.), not in a generic file backup.
  • Certificates: Back up PFX files and known_hosts files to your PKI or secrets vault.
  • Recovery: To restore after a host failure, provision a new VM, install the package, and restore .env + certificates. All handles in RAM are lost (no active handles = fail-safe state; users must call get_credential again).
  • RTO: < 5 minutes (container restart or new VM + .env restore).
  • RPO: 0 (no data to lose — the service holds no persistent state).

15. Upgrade Procedure

Minor upgrade (no config changes)

# Docker
docker pull mcp-privileged:1.1
docker compose up -d mcp-privileged

# Bare metal
cd /opt/mcp-privileged && source .venv/bin/activate
pip install --upgrade /path/to/new/mcp_privileged-1.1.tar.gz
sudo systemctl restart mcp-privileged

Active handles are lost on restart (they expire within TTL anyway). Notify users if the restart window > 5 minutes.

Major upgrade (config changes)

  1. Read the release notes — check for new required env vars
  2. Test in a staging environment first
  3. Update .env with new required values
  4. Follow the minor upgrade steps above
  5. Monitor logs for errors in the first 10 minutes

Rollback

# Docker — roll back to previous image tag
docker compose down
docker run --name mcp-privileged mcp-privileged:1.0 ...

# Bare metal
pip install mcp_privileged==1.0
sudo systemctl restart mcp-privileged