Architecture

System design, modules, data flow, and design rationale.

System diagram

┌─────────────────────────────────────────────────────────────────┐ │ pramana CLI (this repo) │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ cli.py │───▶│ runner │───▶│ provider │───▶│ OpenAI │ │ │ │ │ │ │ │ registry │ │ Anthropic│ │ │ │ commands │ │ run_eval │ │ │ │ Google │ │ │ └──────────┘ └────┬─────┘ └──────────┘ └──────────┘ │ │ │ │ │ ┌────▼─────┐ ┌──────────┐ │ │ │assertions│ │ hash.py │ │ │ │ │ │ SHA-256 │ │ │ └──────────┘ └──────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ storage │ │submitter │───▶ pramana.pages.dev/api/submit │ │ │ local │ │ HTTP POST│ │ │ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────┐ │ pramana-api │ │ (separate │ │ repo) │ └──────────────┘

Module map

File	Purpose	Key exports
`cli.py`	Click command group, entry point	`cli`, `run`, `submit`, `models`, `login`, `logout`, `whoami`, `config`, `delete`
`runner.py`	Eval execution engine	`run_eval(suite_path, provider, temperature, seed)`
`protocol.py`	Pydantic schemas for all data types	`TestCase`, `TestResult`, `EvalResults`, `RunMetadata`
`models.py`	Dynamic model registry with 24h cache	`get_available_models()`, `detect_provider()`, `resolve_alias()`
`hash.py`	Content-addressable hashing	`hash_suite()`, `hash_result()`, `hash_output()`
`assertions.py`	Assertion evaluation	`evaluate_assertion()`
`submitter.py`	HTTP POST to API	`submit_results()`
`storage.py`	Local result persistence	`append_result()`, `load_results()`, `remove_run()`
`auth.py`	OAuth flow, token storage	`login()`, `logout()`, `whoami()`
`providers/base.py`	Abstract provider interface	`BaseProvider`
`providers/registry.py`	Auto-discovery, `@register()` decorator	`register()`, `resolve_provider()`

Data flow

TestCase (JSONL)
  │
  ▼
provider.complete(input_text, temperature, seed)
  │
  ▼
evaluate_assertion(assertion, output, ideal) → AssertionResult
  │
  ▼
hash_result(model_id, test_id, output) → SHA-256
  │
  ▼
TestResult { test_id, output, assertion_result, latency_ms, result_hash }
  │
  ▼
EvalResults { suite_version, suite_hash, run_metadata, results[], summary }
  │
  ▼
append_result(path, results) → local JSON file
  │
  ▼
submit_results(block, api_url) → HTTP POST to pramana-api

Design rationale

Why no LiteLLM?

LiteLLM normalizes API responses across providers. Pramana needs raw responses to detect drift — normalization hides exactly the changes we're measuring.

Why no local models?

Pramana detects provider drift: silent updates to models behind stable identifiers. Local models (Ollama, vLLM) have user-controlled weights with no silent updates.

Why content-addressable hashing?

SHA-256 of (model_id, prompt_id, output) enables deterministic deduplication. Same input + same output = same hash, regardless of when or where it ran. The server deduplicates on this hash.

Why 24h model cache?

Model releases are infrequent. Caching for 24 hours balances freshness vs. performance. Use --refresh to bypass.

Why custom providers?

Full control over API parameters (temperature, seed, message format) is required for scientific reproducibility. Generic wrappers may not pass all parameters correctly.

Repo boundaries

Repo	Scope
`pramana` (this)	CLI, providers, eval execution, submission client
`pramana-api`	Submission endpoint, storage, aggregation, dashboard

Do not add backend features, storage logic, or dashboard code to this repo.