Architecture
System design, modules, data flow, and design rationale.
System diagram
Module map
| File | Purpose | Key exports |
|---|---|---|
cli.py |
Click command group, entry point | cli, run, submit, models, login, logout, whoami, config, delete |
runner.py |
Eval execution engine | run_eval(suite_path, provider, temperature, seed) |
protocol.py |
Pydantic schemas for all data types | TestCase, TestResult, EvalResults, RunMetadata |
models.py |
Dynamic model registry with 24h cache | get_available_models(), detect_provider(), resolve_alias() |
hash.py |
Content-addressable hashing | hash_suite(), hash_result(), hash_output() |
assertions.py |
Assertion evaluation | evaluate_assertion() |
submitter.py |
HTTP POST to API | submit_results() |
storage.py |
Local result persistence | append_result(), load_results(), remove_run() |
auth.py |
OAuth flow, token storage | login(), logout(), whoami() |
providers/base.py |
Abstract provider interface | BaseProvider |
providers/registry.py |
Auto-discovery, @register() decorator |
register(), resolve_provider() |
Data flow
TestCase (JSONL)
│
▼
provider.complete(input_text, temperature, seed)
│
▼
evaluate_assertion(assertion, output, ideal) → AssertionResult
│
▼
hash_result(model_id, test_id, output) → SHA-256
│
▼
TestResult { test_id, output, assertion_result, latency_ms, result_hash }
│
▼
EvalResults { suite_version, suite_hash, run_metadata, results[], summary }
│
▼
append_result(path, results) → local JSON file
│
▼
submit_results(block, api_url) → HTTP POST to pramana-api
Design rationale
Why no LiteLLM?
LiteLLM normalizes API responses across providers. Pramana needs raw responses to detect drift — normalization hides exactly the changes we're measuring.
Why no local models?
Pramana detects provider drift: silent updates to models behind stable identifiers. Local models (Ollama, vLLM) have user-controlled weights with no silent updates.
Why content-addressable hashing?
SHA-256 of (model_id, prompt_id, output) enables deterministic deduplication. Same input + same output = same hash, regardless of when or where it ran. The server deduplicates on this hash.
Why 24h model cache?
Model releases are infrequent. Caching for 24 hours balances freshness vs. performance. Use --refresh to bypass.
Why custom providers?
Full control over API parameters (temperature, seed, message format) is required for scientific reproducibility. Generic wrappers may not pass all parameters correctly.
Repo boundaries
| Repo | Scope |
|---|---|
pramana (this) | CLI, providers, eval execution, submission client |
pramana-api | Submission endpoint, storage, aggregation, dashboard |