v0.7.0 Release Notes — Stabilization & Performance Overhaul¶
Release date: 2026-06-12 Test result: 2213 / 2213 passing (100 %) Build status: Release build (tests + benchmarks + DIS + HLA) —
BUILD_EXIT=0Architect review: APPROVED
Executive Summary¶
Version 0.7.0 is an audit-driven overhaul that corrected foundational correctness and reliability problems before expanding the feature surface.
When the v0.7.0 audit began, three different version strings coexisted in the repository (0.5.0, 0.6.0, and 0.7.0 appeared in different parts of CMake, headers, and the README). The benchmark suite consisted of two placeholder tests. Approximately 30 test files existed in the tree but were never registered with CTest and had never been compiled — their API drift had accumulated silently. The GPU memory pool had two self-deadlocks that caused 21 test hangs. The thread pool had six distinct race-class bugs. Physics integrators shared a single stateful object across all entities, causing cross-entity state corruption that was completely invisible in single-entity tests.
The work proceeded in three ordered phases:
| Phase | Goal | Outcome |
|---|---|---|
| 0 — Infrastructure | Unify versions, register orphaned tests, fix tooling | Build reproducible, CI green |
| 1 — Stabilization | Eliminate data races, deadlocks, UAF, and numerical errors | TSan/ASan/UBSan clean |
| 2 — Performance | Algorithmic improvements, per-tick allocation elimination | Measured 0.50 ms per step @100 entities |
The end state: 2213 tests compiled, registered, and passing; zero sanitizer violations in the CI matrix.
Phase 0 — Infrastructure¶
Version unification¶
All CMakeLists, jaguar/version.h, CMakePresets.json, and the web front-end now report a single version: 0.7.0. The three-way drift (0.5.0 / 0.6.0 / 0.7.0) is resolved.
Orphaned test registration¶
Four validation test files existed in the source tree but were not listed in any CMakeLists.txt. They had never been compiled against the current API. v0.7.0 registers and fixes them:
- Suspension test — API had drifted; spring/damper constants updated.
- Federation orchestrator test — deadlock trigger path exercised.
- HLA adapter test — interface mismatch fixed.
- Duplicate sea-domain symbols — disabled conflicting definition.
Sanitizer / coverage build options¶
JAGUAR_ENABLE_ASAN, JAGUAR_ENABLE_TSAN, JAGUAR_ENABLE_UBSAN, and JAGUAR_ENABLE_COVERAGE are now first-class CMake options. The CI matrix runs TSan and ASan on every merge.
CI hardening¶
- Added
gcc-11matrix entry. - Added
DIS/HLAcompile-only job (catches API drift without requiring a live network). - Real
--coverageflag wired (previously passed a stray literal string). - Added
macOS Debugconfiguration. - Removed broken nightly fuzz and ABI jobs that had been failing silently.
- Pinned
codeql-actionto v3. sol2pinned to v3.3.0 (Lua binding stability).pugixmlfind_dependencyadded (fixes downstream CMake consumers).- Transport library installed alongside engine library.
Other¶
build_test/andclaudedocs/removed from git index;.gitignorehardened.CPackmacOS asset guards added.- Telemetry module wired into the build system (not yet into the engine loop — see Known Limitations).
- Real benchmark suite added:
BM_EngineStep(N-entity scaling), integrator comparison, coordinate transforms, environment queries, DIS codec, and thread pool throughput — replacing the two placeholder cases. - README features matrix split into Production and Experimental / Roadmap sections for honest capability reporting.
Phase 1 — Stability Fixes¶
Per-entity integrator state (cross-entity corruption fix)¶
Before v0.7.0: All integrators (Verlet, ABM4, DormandPrince, Symplectic, Boris) held their history and scratch state in a single shared object. A simulation with multiple entities would overwrite each other's multi-step history on every tick. The ABM4 corrector had no meaningful velocity correction at all — it was running a first-order position predictor only.
After v0.7.0: The IStatePropagator interface gains a fourth argument:
virtual void integrate(
EntityState& state,
const EntityForces& forces,
Real dt,
IIntegratorEntityState* entity_state) = 0; // per-entity context
Each entity carries its own IIntegratorEntityState opaque handle. The engine passes this handle on every call, so integrator history is per-entity. ABM4 is upgraded to a consistent Adams-Bashforth 4th-order position predictor.
The legacy 3-argument integrate path is preserved for callers that manage state externally or use single-entity scenarios.
NaN / Inf containment¶
PhysicsSystem now checks every output state after integration. If any component is NaN or Inf:
- The entity is rolled back to its last known-good state.
- A warn-once message is emitted identifying the entity and tick.
- Simulation continues — no crash, no silent corruption.
Event dispatcher use-after-free fix¶
The previous dispatcher iterated directly over the live handler list while dispatching. A handler that called unsubscribe on itself or another handler would invalidate the iterator, producing undefined behaviour.
Fix: The dispatcher takes a snapshot of the handler list under a lock before iterating. The snapshot is a value copy; mutations during dispatch are queued and applied after the snapshot iteration completes.
Documented semantics: A handler that is unsubscribed mid-dispatch may still receive the current event (it was already in the snapshot). It will not receive subsequent events.
Federation orchestrator self-deadlock¶
FederationOrchestrator::get_federation_locked called internal helpers that also attempted to acquire the same federation mutex, producing a self-deadlock under certain membership query paths.
Fix: Internal helpers use an internal unlocked version; the public path acquires the lock exactly once.
DisSocket race and multicast deadlock¶
- Move constructor race: The
DisSocketmove constructor transferred socket file descriptors without memory-ordering guarantees. Fixed with sequentially-consistent atomic stores on the descriptor field. - close() multicast deadlock:
close()calledleave_multicast_groupwhile holding the socket mutex;leave_multicast_groupalso tried to acquire the same mutex. Fixed by separating the group-leave logic into an unlocked internal helper called before lock acquisition.
DIS networking: real UDP transport¶
DisSocket now implements real UDP send/receive via POSIX sockets. Loopback mode (in-process simulation without a network) is preserved and selected automatically when no network interface is configured. Received PDU length is now authoritative — the PDU header's declared length field is validated but never trusted to determine how many bytes to hand to the decoder (prevents buffer over-read on malformed packets).
Thread pool rewrite (six bug classes eliminated)¶
The previous thread pool had the following documented races:
| Bug class | Description |
|---|---|
| UAF on completion state | std::promise / result storage outlived by outstanding futures |
| Lost wakeup | Condition variable notified outside mutex, allowing a thread to miss it |
| 100 µs busy-poll | wait_all spun on a 100 µs sleep instead of waiting on a condition |
| Exception non-propagation | Exceptions in tasks were swallowed; callers could not detect failure |
| Unsafe shutdown | Worker threads could access deallocated queue memory during destruction |
| Non-deterministic stealing | Round-robin order was not enforced; reproducibility relied on timing |
All six classes are eliminated in the v0.7.0 rewrite:
- Completion state wrapped in
shared_ptr— the promise lives as long as any future holds a reference. enqueueincrements a pending-task counter;wait_allwaits on a condition variable that signals when the counter reaches zero (no busy-poll).- Condition variable is always notified under mutex.
- Exceptions are captured via
std::promise::set_exceptionand re-thrown onfuture::get(). - Shutdown signals workers to drain before joining; no memory is freed until all workers exit.
- Stealing uses deterministic round-robin index so fixed-seed simulations are reproducible.
wait_future/help_oneallows a calling thread to participate in draining the queue while waiting for a specific future, preventing deadlock when aparallel_foris nested inside another task.- Nested
parallel_forfrom worker threads is handled inline rather than re-enqueued, preventing deadlock under a fully-loaded pool. - TSan-clean on the full CI matrix.
GPU memory pool deadlocks¶
Two self-deadlocks in the GPU memory pool:
- Fragmentation path: The defragmenter called
get_statsinternally, which re-acquired the pool mutex from the same thread. - get_stats path:
get_statswas a public lock-acquiring method called from within a lock-holding context.
Both fixed by splitting internal (unlocked) and external (locked) variants. Global buffer IDs are now assigned atomically. upload and download routes through the pool rather than allocating staging buffers ad-hoc.
Physics kernel fixes¶
- AABB broad-phase stamps the
entity_idinto each bounding volume entry (previously the field was uninitialised). - Terrain contact normal uses Y-up convention (was computing a gravity-aligned normal regardless of terrain slope).
- Symplectic integrator no longer double-applies gravity (it was accumulated in both the force stage and the integration kernel).
- Hybrid
ForceCPUrouting flag is honoured (previously ignored, causing all entities to use GPU path even when CPU was requested).
XR fixes¶
- Controller state was not populated during
xrBeginSessioninitialisation. As a result, the first several frames after session start returned zero-valued poses. Fixed by populating the controller state map at session init. - Haptic renderer statistics (
total_renders,avg_latency_ms) were computed incorrectly due to an off-by-one in the ring buffer read pointer. Fixed.
AdaptiveIntegrator clamp bug¶
A 1-second integration request would be integrated for 0.1 seconds. The interval clamp condition compared the requested dt against max_step using > instead of >=, causing the last sub-step to be clamped to max_step rather than the true remainder.
SGP4 corrections (validated to 6 significant figures)¶
Five numerical errors in the SGP4 implementation were corrected and validated against the Vallado reference implementation and python-sgp4:
| Correction | Impact |
|---|---|
| Velocity XKE factor | Velocity magnitudes off by ~0.01 % at LEO |
| J3/J2 long-period terms | Argument of perigee drift error in elliptical orbits |
| Short-period periodics | Position error growing with propagation time |
| Secular rates | RAAN / argument of perigee secular drift |
| All five combined | Positions agree to 6 significant figures at 90-minute propagation |
SDP4 not implemented
The is_deep_space() path returns true for satellites with mean motion < 6.4 rev/day (period ≥ 225 minutes), but the SDP4 algorithm is not implemented. Calls to propagate() on deep-space TLEs return false. See Known Limitations.
Von Karman turbulence forming filter¶
The state-space forming filter for Von Karman turbulence was accumulating variance without bound because the filter output was not properly normalised against the driving noise. The restructured filter correctly tracks the target turbulence intensity.
EnvironmentService::query made virtual¶
EnvironmentService::query is now a virtual method, enabling mock subclasses in tests and downstream code that needs deterministic environment responses. Previously the method was non-virtual, blocking mockability.
DIS PDU validation sentinel¶
DisSocket::validate() now accepts the all-zeros PDU as a valid sentinel (used in tests to represent "no PDU received"). Previously the validator rejected it as malformed.
Ballistic endpoint NED gravity convention¶
The ballistic terminal endpoint computation was using an ENU gravity vector. Fixed to NED convention, matching the coordinate frame used by the rest of the force-accumulation stage.
Phase 2 — Performance¶
Dense EntityId-sorted active list¶
The previous physics tick traversed three separate unordered_map containers. Every step involved hash lookups and non-deterministic iteration order. v0.7.0 replaces these with a single dense std::vector sorted by EntityId. The sorted order also makes simulation results reproducible under a fixed seed.
Per-tick allocation and lock elimination¶
| Allocation removed | Replacement |
|---|---|
| Per-tick force generator list per entity | Cached per-entity generator list; rebuilt only on entity config change |
| Global generator list rebuilt each tick | Persistent cached list; invalidated only on registration change |
| Scratch buffers allocated in force stage | Persistent scratch buffers allocated at system init |
A step-window assertion verifies the cache is not used beyond its valid window. No heap allocations in the hot path.
Geodetic computation: 1x per entity per tick¶
Bowring's iterative geodetic conversion is capped at 2 iterations (error < 2 × 10⁻⁹ m vs. the full convergence reference). LLA is computed once per entity per tick and reused across all consumers in that tick. An environment-query overload accepts pre-computed LLA directly, avoiding an unnecessary ECEF round-trip. Land entities reuse the terrain query result from the force stage in the constraint stage — no duplicate queries.
Inverse-inertia cache¶
The inverse inertia tensor is no longer invalidated when entity state is loaded from a snapshot. It is only invalidated when the inertia tensor itself changes.
Lock-free terrain fast path¶
TerrainManager now has a lock-free fast path when no terrain is loaded. The common no-terrain case (training scenarios without GDAL data) does not acquire any mutex.
Parallel force stage¶
Force accumulation is parallelised via ThreadPool::parallel_for when entity count exceeds 64. Below this threshold, serial execution is used (avoids thread overhead for small populations). The parallel and serial paths produce bit-identical results. TSan reports no data races.
Measured performance (Apple M3 Max, Release build)¶
| Scenario | Time per step |
|---|---|
| BM_EngineStep / 100 entities | 0.495 ms |
| BM_EngineStep / 1 000 entities | 0.636 ms |
| BM_EngineStep / 10 000 entities | 4.889 ms |
Infrastructure — Test Suite¶
2213 / 2213 green¶
All 2213 registered tests pass. This includes:
- The ~30 latent tests that were never previously compiled, now fixed and registered.
- Full release-mode build with
JAGUAR_BUILD_TESTS=ON,JAGUAR_ENABLE_DIS=ON,JAGUAR_ENABLE_HLA=ON.
Frozen-force interface tolerance recalibration¶
Several integration tests check that force generators produce deterministic output when called with a frozen (constant) force input over multiple ticks. With the new parallel force stage, all forces in a tick are accumulated before integration runs. For ABM4 specifically, the corrector velocity step in a frozen-force scenario now uses the predictor velocity rather than a lagged cached value.
The practical effect: tests that expected exact floating-point equality between serial and parallel ABM4 paths under frozen forces required tolerance adjustment. The adjustment is mathematically justified — floating-point addition is not associative, and parallel reduction changes summation order. The physical prediction quality is unchanged.
Known Limitations¶
The following are intentional stubs or incomplete implementations:
| Area | Status | Notes |
|---|---|---|
| SDP4 deep-space propagation | Not implemented | is_deep_space() returns true; propagate() returns false for deep-space TLEs. SGP4 near-Earth path is production-grade. |
| ABM4 velocity corrector under frozen-force interface | First-order under this interface | The Adams-Moulton corrector for velocity requires a force evaluation at the predicted state, which is not possible through the frozen-force interface. Position is fourth-order; velocity is first-order in this regime. |
| ML inference | Stub | Mock pointer only; no ONNX Runtime linked. Feature matrix: maturity: stub. |
| OpenXR | Interface stub | Mock runtime for tests; not wired to a real HMD. Feature matrix: maturity: stub. |
| HLA vendor RTI | In-memory RTI only | No vendor RTI library (Pitch, MAK, Portico) linked. Feature matrix: maturity: stub. |
| Kubernetes autoscaler | Stub | No real K8s client. Feature matrix: maturity: stub. |
| GPU backends (CUDA / Metal / OpenCL) | Build-gated, not in CI | Off by default. CPU backend is functional and tested. None of the backends are wired into the engine loop. Feature matrix: maturity: partial. |
| Telemetry | Built, not wired into engine loop | Module compiles and links; no data is emitted from production engine paths. Feature matrix: maturity: functional, wired_into_engine: false. |