Benchdash — Web Benchmark Dashboard¶

Benchdash is the zero-dependency web dashboard that unifies benchmarks, tests, and feature maturity for JaguarEngine. It is pure Python 3 standard library plus vanilla JS — no npm, no build step.

Chart.js is loaded from a CDN but every chart degrades to a data table automatically when the CDN is unavailable (e.g., offline CI environments).

Quick Start¶

# Full pipeline: build → test → benchmark → serve
python3 tools/benchdash/run_bench.py --serve

# Skip the build, measure against an existing build directory
python3 tools/benchdash/run_bench.py --skip-build --serve

# UI preview only — synthetic data, no build, test, or bench required
python3 tools/benchdash/run_bench.py --selftest --serve

Open the URL printed on the console (default: http://127.0.0.1:8765/index.html).

Must be served over HTTP

Opening index.html directly via file:// cannot fetch /api/latest or /api/history. The dashboard detects this and displays a notice instructing you to use --serve.

CLI Reference¶

python3 tools/benchdash/run_bench.py [options]

Pipeline options:
  --build-dir DIR     CMake build directory (default: build)
  --skip-build        Skip cmake --build step
  --skip-tests        Skip ctest step
  --skip-bench        Skip benchmark execution
  --bench-filter RE   google-benchmark --benchmark_filter regex

Output / labelling:
  --label TEXT        Free-form label attached to this run (stored in history)

Serve options:
  --serve             Start dashboard server after pipeline
  --serve-only        Skip pipeline; serve existing data only
  --port N            HTTP server port (default: 8765)

Testing:
  --selftest          Generate synthetic data and exit
                      (25+ benchmarks, 2 failing test suites, all maturity tiers)

Pipeline¶

The pipeline runs eight stages in sequence. Each stage fails gracefully — a broken build does not prevent the test, benchmark, or feature matrix tabs from rendering.

Stage	Action	On failure
1. build	`cmake --build <dir> -j8`	`build_ok: false`; pipeline continues
2. tests	`ctest --output-junit ... --timeout 120 -j4` (falls back to stdout parsing if JUnit unsupported)	Empty result; pipeline continues
3. bench	Searches `<dir>/bin/` for `jaguar_benchmarks*`, runs `--benchmark_format=json`	Empty result; pipeline continues
4. meta	Collects git commit/branch/dirty flag, platform, CPU, engine version	Uses safe defaults
5. features	Loads `tools/benchdash/features.json` as-is	Empty array
6. write	Writes `tools/benchdash/data/latest.json` (schema_version 1)	—
7. history	Appends one compact line to `tools/benchdash/data/history.jsonl` (benchmark results capped at 200 entries)	—
8. serve	Serves static SPA + `/api/latest` + `/api/history`	Ctrl-C to stop

Output Files¶

File	Description
`tools/benchdash/data/latest.json`	Full snapshot of the most recent run
`tools/benchdash/data/history.jsonl`	One compact line per run, in append order

All benchmark times are normalised to milliseconds regardless of the time_unit reported by google-benchmark.

Gitignore recommendation

latest.json and history.jsonl are run artifacts. Add tools/benchdash/data/*.json tools/benchdash/data/*.jsonl to .gitignore.

Dashboard Tabs¶

Tab	Content
Overview	Build status chip, pass rate, fail/timeout counts, representative performance card, sparklines for pass rate and key benchmark over time
Performance	Horizontal bar chart of all benchmarks (log-scale toggle), `BM_EngineStep` N-scaling line chart (ms vs entity count), per-benchmark history trend
Tests	Summary bar + suite table with filter, failed suites expand to list individual failing test names
Feature Matrix	Table from `features.json` — subsystem, maturity chip (production / functional / partial / stub / facade), engine-wired flag, tested flag, notes
Run History	Table of all `history.jsonl` entries, newest first

Additional UI behaviours:

Dark / light theme toggle in the top-right corner; preference is stored in localStorage.
Chart.js unavailable → all charts degrade automatically to data tables.

API Endpoints (served mode)¶

All responses carry Cache-Control: no-store.

Path	Description	Content-Type
`GET /index.html`	Dashboard SPA shell	`text/html`
`GET /app.js`	SPA logic	`text/javascript`
`GET /style.css`	Design tokens (dark/light)	`text/css`
`GET /api/latest`	`data/latest.json` — full run snapshot	`application/json`
`GET /api/history`	`data/history.jsonl` — newline-delimited runs	`application/x-ndjson`

Schema Reference¶

`latest.json` (schema_version: 1)¶

{
  "schema_version": 1,
  "timestamp": "2026-06-12T19:55:56",       // ISO 8601 local time
  "label": "nightly",                         // --label value, or ""
  "git": {
    "commit": "d8ce160",
    "branch": "main",
    "dirty": false
  },
  "machine": {
    "os": "Darwin 24.6.0",
    "cpu": "Apple M3 Max",
    "cores": 16
  },
  "build": {
    "version": "0.7.0",
    "build_ok": true,
    "build_seconds": 42.7
  },
  "benchmarks": {
    "context": { /* google-benchmark context: num_cpus, caches … */ },
    "results": [
      {
        "name": "BM_EngineStep/1000",
        "run_type": "iteration",           // "iteration" | "aggregate"
        "iterations": 2000,
        "real_ms": 0.636,                  // normalised to ms
        "cpu_ms": 0.630,
        "time_unit": "ms",
        "threads": 1,
        "repetitions": 1,
        "items_per_second": 1572327.0,     // present when reported
        "counters": { "entities": 1000.0 } // user counters (if any)
      }
    ]
  },
  "tests": {
    "total": 2213,
    "passed": 2213,
    "failed": 0,
    "timeout": 0,
    "notrun": 0,
    "duration_s": 312.4,
    "suites": [
      {
        "name": "PhysicsIntegratorTest",
        "total": 48,
        "passed": 48,
        "failed": 0,
        "time_s": 1.23,
        "failed_tests": []
      }
    ]
  },
  "features": [
    {
      "subsystem": "core",
      "maturity": "production",        // production | functional | partial | stub | facade
      "wired_into_engine": true,
      "tested": true,
      "notes": ""
    }
  ]
}

`history.jsonl` (one line per run)¶

{
  "timestamp": "2026-06-12T19:55:56",
  "label": "nightly",
  "git_commit": "d8ce160",
  "branch": "main",
  "tests": {
    "passed": 2213, "failed": 0, "timeout": 0, "notrun": 0, "total": 2213
  },
  "bench": [
    { "name": "BM_EngineStep/1000", "real_ms": 0.636 }
  ]
  // bench contains iteration-type results only, capped at 200 entries
}

Self-Test Verification¶

# Syntax-check the pipeline script
python3 -m py_compile tools/benchdash/run_bench.py

# Generate synthetic data and verify the write path
python3 tools/benchdash/run_bench.py --selftest

# Verify the output is valid schema_version 1
python3 -c "import json; d=json.load(open('tools/benchdash/data/latest.json')); assert d['schema_version']==1"

The --selftest run generates at least 25 benchmarks (including BM_EngineStep/100 through BM_EngineStep/10000), a test suite with 2 failing tests, and entries for every maturity tier.

CI Usage¶

Benchdash is designed to fit into CI pipelines without any external dependencies.

# GitHub Actions example
- name: Run benchmark pipeline
  run: python3 tools/benchdash/run_bench.py --build-dir build --label "${{ github.sha }}"

- name: Verify schema
  run: python3 -c "import json; d=json.load(open('tools/benchdash/data/latest.json')); assert d['schema_version']==1; print('schema ok')"

- name: Upload artifacts
  uses: actions/upload-artifact@v4
  with:
    name: benchdash-data
    path: tools/benchdash/data/

Use --serve-only in a follow-up job or post-merge pipeline to host the dashboard from a previously uploaded artifact.

Directory Layout¶

tools/benchdash/
├── run_bench.py      # Orchestrator CLI (stdlib only)
├── index.html        # Dashboard SPA shell
├── app.js            # SPA logic (vanilla JS)
├── style.css         # Design system (dark/light tokens)
├── features.json     # Feature/subsystem maturity matrix (source of truth)
├── README.md         # Bilingual quick-start
└── data/
    ├── .gitkeep
    ├── latest.json   # Written by run_bench.py (gitignore recommended)
    └── history.jsonl # Appended by run_bench.py (gitignore recommended)