Benchdash — Web Benchmark Dashboard¶
Benchdash is the zero-dependency web dashboard that unifies benchmarks, tests, and feature maturity for JaguarEngine. It is pure Python 3 standard library plus vanilla JS — no npm, no build step.
Chart.js is loaded from a CDN but every chart degrades to a data table automatically when the CDN is unavailable (e.g., offline CI environments).
Quick Start¶
# Full pipeline: build → test → benchmark → serve
python3 tools/benchdash/run_bench.py --serve
# Skip the build, measure against an existing build directory
python3 tools/benchdash/run_bench.py --skip-build --serve
# UI preview only — synthetic data, no build, test, or bench required
python3 tools/benchdash/run_bench.py --selftest --serve
Open the URL printed on the console (default: http://127.0.0.1:8765/index.html).
Must be served over HTTP
Opening index.html directly via file:// cannot fetch /api/latest or /api/history. The dashboard detects this and displays a notice instructing you to use --serve.
CLI Reference¶
python3 tools/benchdash/run_bench.py [options]
Pipeline options:
--build-dir DIR CMake build directory (default: build)
--skip-build Skip cmake --build step
--skip-tests Skip ctest step
--skip-bench Skip benchmark execution
--bench-filter RE google-benchmark --benchmark_filter regex
Output / labelling:
--label TEXT Free-form label attached to this run (stored in history)
Serve options:
--serve Start dashboard server after pipeline
--serve-only Skip pipeline; serve existing data only
--port N HTTP server port (default: 8765)
Testing:
--selftest Generate synthetic data and exit
(25+ benchmarks, 2 failing test suites, all maturity tiers)
Pipeline¶
The pipeline runs eight stages in sequence. Each stage fails gracefully — a broken build does not prevent the test, benchmark, or feature matrix tabs from rendering.
| Stage | Action | On failure |
|---|---|---|
| 1. build | cmake --build <dir> -j8 |
build_ok: false; pipeline continues |
| 2. tests | ctest --output-junit ... --timeout 120 -j4 (falls back to stdout parsing if JUnit unsupported) |
Empty result; pipeline continues |
| 3. bench | Searches <dir>/bin/ for jaguar_benchmarks*, runs --benchmark_format=json |
Empty result; pipeline continues |
| 4. meta | Collects git commit/branch/dirty flag, platform, CPU, engine version | Uses safe defaults |
| 5. features | Loads tools/benchdash/features.json as-is |
Empty array |
| 6. write | Writes tools/benchdash/data/latest.json (schema_version 1) |
— |
| 7. history | Appends one compact line to tools/benchdash/data/history.jsonl (benchmark results capped at 200 entries) |
— |
| 8. serve | Serves static SPA + /api/latest + /api/history |
Ctrl-C to stop |
Output Files¶
| File | Description |
|---|---|
tools/benchdash/data/latest.json |
Full snapshot of the most recent run |
tools/benchdash/data/history.jsonl |
One compact line per run, in append order |
All benchmark times are normalised to milliseconds regardless of the time_unit reported by google-benchmark.
Gitignore recommendation
latest.json and history.jsonl are run artifacts. Add tools/benchdash/data/*.json tools/benchdash/data/*.jsonl to .gitignore.
Dashboard Tabs¶
| Tab | Content |
|---|---|
| Overview | Build status chip, pass rate, fail/timeout counts, representative performance card, sparklines for pass rate and key benchmark over time |
| Performance | Horizontal bar chart of all benchmarks (log-scale toggle), BM_EngineStep N-scaling line chart (ms vs entity count), per-benchmark history trend |
| Tests | Summary bar + suite table with filter, failed suites expand to list individual failing test names |
| Feature Matrix | Table from features.json — subsystem, maturity chip (production / functional / partial / stub / facade), engine-wired flag, tested flag, notes |
| Run History | Table of all history.jsonl entries, newest first |
Additional UI behaviours:
- Dark / light theme toggle in the top-right corner; preference is stored in
localStorage. - Chart.js unavailable → all charts degrade automatically to data tables.
API Endpoints (served mode)¶
All responses carry Cache-Control: no-store.
| Path | Description | Content-Type |
|---|---|---|
GET /index.html |
Dashboard SPA shell | text/html |
GET /app.js |
SPA logic | text/javascript |
GET /style.css |
Design tokens (dark/light) | text/css |
GET /api/latest |
data/latest.json — full run snapshot |
application/json |
GET /api/history |
data/history.jsonl — newline-delimited runs |
application/x-ndjson |
Schema Reference¶
latest.json (schema_version: 1)¶
{
"schema_version": 1,
"timestamp": "2026-06-12T19:55:56", // ISO 8601 local time
"label": "nightly", // --label value, or ""
"git": {
"commit": "d8ce160",
"branch": "main",
"dirty": false
},
"machine": {
"os": "Darwin 24.6.0",
"cpu": "Apple M3 Max",
"cores": 16
},
"build": {
"version": "0.7.0",
"build_ok": true,
"build_seconds": 42.7
},
"benchmarks": {
"context": { /* google-benchmark context: num_cpus, caches … */ },
"results": [
{
"name": "BM_EngineStep/1000",
"run_type": "iteration", // "iteration" | "aggregate"
"iterations": 2000,
"real_ms": 0.636, // normalised to ms
"cpu_ms": 0.630,
"time_unit": "ms",
"threads": 1,
"repetitions": 1,
"items_per_second": 1572327.0, // present when reported
"counters": { "entities": 1000.0 } // user counters (if any)
}
]
},
"tests": {
"total": 2213,
"passed": 2213,
"failed": 0,
"timeout": 0,
"notrun": 0,
"duration_s": 312.4,
"suites": [
{
"name": "PhysicsIntegratorTest",
"total": 48,
"passed": 48,
"failed": 0,
"time_s": 1.23,
"failed_tests": []
}
]
},
"features": [
{
"subsystem": "core",
"maturity": "production", // production | functional | partial | stub | facade
"wired_into_engine": true,
"tested": true,
"notes": ""
}
]
}
history.jsonl (one line per run)¶
{
"timestamp": "2026-06-12T19:55:56",
"label": "nightly",
"git_commit": "d8ce160",
"branch": "main",
"tests": {
"passed": 2213, "failed": 0, "timeout": 0, "notrun": 0, "total": 2213
},
"bench": [
{ "name": "BM_EngineStep/1000", "real_ms": 0.636 }
]
// bench contains iteration-type results only, capped at 200 entries
}
Self-Test Verification¶
# Syntax-check the pipeline script
python3 -m py_compile tools/benchdash/run_bench.py
# Generate synthetic data and verify the write path
python3 tools/benchdash/run_bench.py --selftest
# Verify the output is valid schema_version 1
python3 -c "import json; d=json.load(open('tools/benchdash/data/latest.json')); assert d['schema_version']==1"
The --selftest run generates at least 25 benchmarks (including BM_EngineStep/100 through BM_EngineStep/10000), a test suite with 2 failing tests, and entries for every maturity tier.
CI Usage¶
Benchdash is designed to fit into CI pipelines without any external dependencies.
# GitHub Actions example
- name: Run benchmark pipeline
run: python3 tools/benchdash/run_bench.py --build-dir build --label "${{ github.sha }}"
- name: Verify schema
run: python3 -c "import json; d=json.load(open('tools/benchdash/data/latest.json')); assert d['schema_version']==1; print('schema ok')"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: benchdash-data
path: tools/benchdash/data/
Use --serve-only in a follow-up job or post-merge pipeline to host the dashboard from a previously uploaded artifact.
Directory Layout¶
tools/benchdash/
├── run_bench.py # Orchestrator CLI (stdlib only)
├── index.html # Dashboard SPA shell
├── app.js # SPA logic (vanilla JS)
├── style.css # Design system (dark/light tokens)
├── features.json # Feature/subsystem maturity matrix (source of truth)
├── README.md # Bilingual quick-start
└── data/
├── .gitkeep
├── latest.json # Written by run_bench.py (gitignore recommended)
└── history.jsonl # Appended by run_bench.py (gitignore recommended)