Architecture

System design, modules, data flow, and design rationale.

System diagram

┌─────────────────────────────────────────────────────────────────┐ │ pramana CLI (this repo) │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ cli.py │───▶│ runner │───▶│ provider │───▶│ OpenAI │ │ │ │ │ │ │ │ registry │ │ Anthropic│ │ │ │ commands │ │ run_eval │ │ │ │ Google │ │ │ └──────────┘ └────┬─────┘ └──────────┘ └──────────┘ │ │ │ │ │ ┌────▼─────┐ ┌──────────┐ │ │ │assertions│ │ hash.py │ │ │ │ │ │ SHA-256 │ │ │ └──────────┘ └──────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ storage │ │submitter │───▶ pramana.pages.dev/api/submit │ │ │ local │ │ HTTP POST│ │ │ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────┐ │ pramana-api │ │ (separate │ │ repo) │ └──────────────┘

Module map

FilePurposeKey exports
cli.py Click command group, entry point cli, run, submit, models, login, logout, whoami, config, delete
runner.py Eval execution engine run_eval(suite_path, provider, temperature, seed)
protocol.py Pydantic schemas for all data types TestCase, TestResult, EvalResults, RunMetadata
models.py Dynamic model registry with 24h cache get_available_models(), detect_provider(), resolve_alias()
hash.py Content-addressable hashing hash_suite(), hash_result(), hash_output()
assertions.py Assertion evaluation evaluate_assertion()
submitter.py HTTP POST to API submit_results()
storage.py Local result persistence append_result(), load_results(), remove_run()
auth.py OAuth flow, token storage login(), logout(), whoami()
providers/base.py Abstract provider interface BaseProvider
providers/registry.py Auto-discovery, @register() decorator register(), resolve_provider()

Data flow

TestCase (JSONL)
  │
  ▼
provider.complete(input_text, temperature, seed)
  │
  ▼
evaluate_assertion(assertion, output, ideal) → AssertionResult
  │
  ▼
hash_result(model_id, test_id, output) → SHA-256
  │
  ▼
TestResult { test_id, output, assertion_result, latency_ms, result_hash }
  │
  ▼
EvalResults { suite_version, suite_hash, run_metadata, results[], summary }
  │
  ▼
append_result(path, results) → local JSON file
  │
  ▼
submit_results(block, api_url) → HTTP POST to pramana-api

Design rationale

Why no LiteLLM?

LiteLLM normalizes API responses across providers. Pramana needs raw responses to detect drift — normalization hides exactly the changes we're measuring.

Why no local models?

Pramana detects provider drift: silent updates to models behind stable identifiers. Local models (Ollama, vLLM) have user-controlled weights with no silent updates.

Why content-addressable hashing?

SHA-256 of (model_id, prompt_id, output) enables deterministic deduplication. Same input + same output = same hash, regardless of when or where it ran. The server deduplicates on this hash.

Why 24h model cache?

Model releases are infrequent. Caching for 24 hours balances freshness vs. performance. Use --refresh to bypass.

Why custom providers?

Full control over API parameters (temperature, seed, message format) is required for scientific reproducibility. Generic wrappers may not pass all parameters correctly.

Repo boundaries

RepoScope
pramana (this)CLI, providers, eval execution, submission client
pramana-apiSubmission endpoint, storage, aggregation, dashboard
Do not add backend features, storage logic, or dashboard code to this repo.