Skip to content

Benchdash — Web Benchmark Dashboard

Benchdash is the zero-dependency web dashboard that unifies benchmarks, tests, and feature maturity for JaguarEngine. It is pure Python 3 standard library plus vanilla JS — no npm, no build step.

Chart.js is loaded from a CDN but every chart degrades to a data table automatically when the CDN is unavailable (e.g., offline CI environments).


Quick Start

# Full pipeline: build → test → benchmark → serve
python3 tools/benchdash/run_bench.py --serve

# Skip the build, measure against an existing build directory
python3 tools/benchdash/run_bench.py --skip-build --serve

# UI preview only — synthetic data, no build, test, or bench required
python3 tools/benchdash/run_bench.py --selftest --serve

Open the URL printed on the console (default: http://127.0.0.1:8765/index.html).

Must be served over HTTP

Opening index.html directly via file:// cannot fetch /api/latest or /api/history. The dashboard detects this and displays a notice instructing you to use --serve.


CLI Reference

python3 tools/benchdash/run_bench.py [options]

Pipeline options:
  --build-dir DIR     CMake build directory (default: build)
  --skip-build        Skip cmake --build step
  --skip-tests        Skip ctest step
  --skip-bench        Skip benchmark execution
  --bench-filter RE   google-benchmark --benchmark_filter regex

Output / labelling:
  --label TEXT        Free-form label attached to this run (stored in history)

Serve options:
  --serve             Start dashboard server after pipeline
  --serve-only        Skip pipeline; serve existing data only
  --port N            HTTP server port (default: 8765)

Testing:
  --selftest          Generate synthetic data and exit
                      (25+ benchmarks, 2 failing test suites, all maturity tiers)

Pipeline

The pipeline runs eight stages in sequence. Each stage fails gracefully — a broken build does not prevent the test, benchmark, or feature matrix tabs from rendering.

Stage Action On failure
1. build cmake --build <dir> -j8 build_ok: false; pipeline continues
2. tests ctest --output-junit ... --timeout 120 -j4 (falls back to stdout parsing if JUnit unsupported) Empty result; pipeline continues
3. bench Searches <dir>/bin/ for jaguar_benchmarks*, runs --benchmark_format=json Empty result; pipeline continues
4. meta Collects git commit/branch/dirty flag, platform, CPU, engine version Uses safe defaults
5. features Loads tools/benchdash/features.json as-is Empty array
6. write Writes tools/benchdash/data/latest.json (schema_version 1)
7. history Appends one compact line to tools/benchdash/data/history.jsonl (benchmark results capped at 200 entries)
8. serve Serves static SPA + /api/latest + /api/history Ctrl-C to stop

Output Files

File Description
tools/benchdash/data/latest.json Full snapshot of the most recent run
tools/benchdash/data/history.jsonl One compact line per run, in append order

All benchmark times are normalised to milliseconds regardless of the time_unit reported by google-benchmark.

Gitignore recommendation

latest.json and history.jsonl are run artifacts. Add tools/benchdash/data/*.json tools/benchdash/data/*.jsonl to .gitignore.


Dashboard Tabs

Tab Content
Overview Build status chip, pass rate, fail/timeout counts, representative performance card, sparklines for pass rate and key benchmark over time
Performance Horizontal bar chart of all benchmarks (log-scale toggle), BM_EngineStep N-scaling line chart (ms vs entity count), per-benchmark history trend
Tests Summary bar + suite table with filter, failed suites expand to list individual failing test names
Feature Matrix Table from features.json — subsystem, maturity chip (production / functional / partial / stub / facade), engine-wired flag, tested flag, notes
Run History Table of all history.jsonl entries, newest first

Additional UI behaviours:

  • Dark / light theme toggle in the top-right corner; preference is stored in localStorage.
  • Chart.js unavailable → all charts degrade automatically to data tables.

API Endpoints (served mode)

All responses carry Cache-Control: no-store.

Path Description Content-Type
GET /index.html Dashboard SPA shell text/html
GET /app.js SPA logic text/javascript
GET /style.css Design tokens (dark/light) text/css
GET /api/latest data/latest.json — full run snapshot application/json
GET /api/history data/history.jsonl — newline-delimited runs application/x-ndjson

Schema Reference

latest.json (schema_version: 1)

{
  "schema_version": 1,
  "timestamp": "2026-06-12T19:55:56",       // ISO 8601 local time
  "label": "nightly",                         // --label value, or ""
  "git": {
    "commit": "d8ce160",
    "branch": "main",
    "dirty": false
  },
  "machine": {
    "os": "Darwin 24.6.0",
    "cpu": "Apple M3 Max",
    "cores": 16
  },
  "build": {
    "version": "0.7.0",
    "build_ok": true,
    "build_seconds": 42.7
  },
  "benchmarks": {
    "context": { /* google-benchmark context: num_cpus, caches … */ },
    "results": [
      {
        "name": "BM_EngineStep/1000",
        "run_type": "iteration",           // "iteration" | "aggregate"
        "iterations": 2000,
        "real_ms": 0.636,                  // normalised to ms
        "cpu_ms": 0.630,
        "time_unit": "ms",
        "threads": 1,
        "repetitions": 1,
        "items_per_second": 1572327.0,     // present when reported
        "counters": { "entities": 1000.0 } // user counters (if any)
      }
    ]
  },
  "tests": {
    "total": 2213,
    "passed": 2213,
    "failed": 0,
    "timeout": 0,
    "notrun": 0,
    "duration_s": 312.4,
    "suites": [
      {
        "name": "PhysicsIntegratorTest",
        "total": 48,
        "passed": 48,
        "failed": 0,
        "time_s": 1.23,
        "failed_tests": []
      }
    ]
  },
  "features": [
    {
      "subsystem": "core",
      "maturity": "production",        // production | functional | partial | stub | facade
      "wired_into_engine": true,
      "tested": true,
      "notes": ""
    }
  ]
}

history.jsonl (one line per run)

{
  "timestamp": "2026-06-12T19:55:56",
  "label": "nightly",
  "git_commit": "d8ce160",
  "branch": "main",
  "tests": {
    "passed": 2213, "failed": 0, "timeout": 0, "notrun": 0, "total": 2213
  },
  "bench": [
    { "name": "BM_EngineStep/1000", "real_ms": 0.636 }
  ]
  // bench contains iteration-type results only, capped at 200 entries
}

Self-Test Verification

# Syntax-check the pipeline script
python3 -m py_compile tools/benchdash/run_bench.py

# Generate synthetic data and verify the write path
python3 tools/benchdash/run_bench.py --selftest

# Verify the output is valid schema_version 1
python3 -c "import json; d=json.load(open('tools/benchdash/data/latest.json')); assert d['schema_version']==1"

The --selftest run generates at least 25 benchmarks (including BM_EngineStep/100 through BM_EngineStep/10000), a test suite with 2 failing tests, and entries for every maturity tier.


CI Usage

Benchdash is designed to fit into CI pipelines without any external dependencies.

# GitHub Actions example
- name: Run benchmark pipeline
  run: python3 tools/benchdash/run_bench.py --build-dir build --label "${{ github.sha }}"

- name: Verify schema
  run: python3 -c "import json; d=json.load(open('tools/benchdash/data/latest.json')); assert d['schema_version']==1; print('schema ok')"

- name: Upload artifacts
  uses: actions/upload-artifact@v4
  with:
    name: benchdash-data
    path: tools/benchdash/data/

Use --serve-only in a follow-up job or post-merge pipeline to host the dashboard from a previously uploaded artifact.


Directory Layout

tools/benchdash/
├── run_bench.py      # Orchestrator CLI (stdlib only)
├── index.html        # Dashboard SPA shell
├── app.js            # SPA logic (vanilla JS)
├── style.css         # Design system (dark/light tokens)
├── features.json     # Feature/subsystem maturity matrix (source of truth)
├── README.md         # Bilingual quick-start
└── data/
    ├── .gitkeep
    ├── latest.json   # Written by run_bench.py (gitignore recommended)
    └── history.jsonl # Appended by run_bench.py (gitignore recommended)