Skip to main content

mcp-context-budget

Measure and enforce MCP tool-surface budgets before your coding agent starts.

Open source (MIT) · Local-first · Dependency-free core · CI-enforceable.

The problem

MCP servers quietly inflate an agent's tool surface — many tools, thousands of tokens of schemas — bloating context, slowing runs, and raising cost before the agent does any real work, with no static way to see or cap it.

A local-first CLI that scans your MCP config, measures the token budget each server and tool adds, selects a lean task-relevant tool set, and ENFORCES it in CI — failing the build when the surface exceeds budget. No runtime service, no proxy, nothing leaves your machine.

Quickstart

Not on PyPI — install from source or run the Docker image. The core CLI has no external runtime dependency.

Install from source (Requires Python 3.11+):

git clone https://github.com/OrionArchitekton/mcp-context-budget
cd mcp-context-budget
python3.11 -m venv .venv && . .venv/bin/activate
pip install -e '.[dev]'

Command surface

  • scan: Estimate schema and response-token cost from an MCP config or tools/list fixture; emit a report and a lockfile.
  • select: Pick a smaller task-relevant tool set with deterministic SQLite FTS5/BM25 ranking, under max-tools and max-schema-tokens.
  • semantic-select: Rank tools by embedding similarity (deterministic fixture mode, or optional local Ollama) before applying the budget caps.
  • check: Re-validate a lockfile against schema and response budgets — the CI gate that fails the build on a regression.
  • compress-responses: Deterministically compress recorded response fixtures under a response budget, with before/after proof.
  • config-apply: Turn a selected-tool lock into a safe local MCP config patch — dry-run by default, write requires --write and makes a backup.
  • config-audit: Read-only hygiene check that flags plaintext secrets in MCP config files without ever printing the values.
  • export: Export the budget result (e.g. SARIF) for code-scanning and CI surfacing.

Why it is different

  • Local-first: No runtime service, no proxy, no hosted dashboard. The CLI runs on your machine and nothing leaves it. Semantic ranking can optionally call a local Ollama — only when you explicitly ask for it.
  • Dependency-free core: The core package ships with zero Python dependencies. It scans, measures, selects, and enforces using the standard library — easy to vendor, audit, and trust in a build pipeline.
  • CI-enforceable: A lockfile plus a check command turns "our context is bloated" into a build gate. The check exits non-zero when the schema or response budget regresses, so the surface stops growing silently.
  • Honest, never a false PASS: config-apply binds each lock to its config by fingerprint and reports PARTIAL (not a fake PASS) when a command-discovered server cannot be statically enforced. Secret audits redact values; reports never print literal secrets.

Links