v0.7.0 Release Notes — Stabilization & Performance Overhaul¶

Release date: 2026-06-12 Test result: 2213 / 2213 passing (100 %) Build status: Release build (tests + benchmarks + DIS + HLA) — BUILD_EXIT=0 Architect review: APPROVED

Executive Summary¶

Version 0.7.0 is an audit-driven overhaul that corrected foundational correctness and reliability problems before expanding the feature surface.

When the v0.7.0 audit began, three different version strings coexisted in the repository (0.5.0, 0.6.0, and 0.7.0 appeared in different parts of CMake, headers, and the README). The benchmark suite consisted of two placeholder tests. Approximately 30 test files existed in the tree but were never registered with CTest and had never been compiled — their API drift had accumulated silently. The GPU memory pool had two self-deadlocks that caused 21 test hangs. The thread pool had six distinct race-class bugs. Physics integrators shared a single stateful object across all entities, causing cross-entity state corruption that was completely invisible in single-entity tests.

The work proceeded in three ordered phases:

Phase	Goal	Outcome
0 — Infrastructure	Unify versions, register orphaned tests, fix tooling	Build reproducible, CI green
1 — Stabilization	Eliminate data races, deadlocks, UAF, and numerical errors	TSan/ASan/UBSan clean
2 — Performance	Algorithmic improvements, per-tick allocation elimination	Measured 0.50 ms per step @100 entities

The end state: 2213 tests compiled, registered, and passing; zero sanitizer violations in the CI matrix.

Phase 0 — Infrastructure¶

Version unification¶

All CMakeLists, jaguar/version.h, CMakePresets.json, and the web front-end now report a single version: 0.7.0. The three-way drift (0.5.0 / 0.6.0 / 0.7.0) is resolved.

Orphaned test registration¶

Four validation test files existed in the source tree but were not listed in any CMakeLists.txt. They had never been compiled against the current API. v0.7.0 registers and fixes them:

Suspension test — API had drifted; spring/damper constants updated.
Federation orchestrator test — deadlock trigger path exercised.
HLA adapter test — interface mismatch fixed.
Duplicate sea-domain symbols — disabled conflicting definition.

Sanitizer / coverage build options¶

JAGUAR_ENABLE_ASAN, JAGUAR_ENABLE_TSAN, JAGUAR_ENABLE_UBSAN, and JAGUAR_ENABLE_COVERAGE are now first-class CMake options. The CI matrix runs TSan and ASan on every merge.

CI hardening¶

Added gcc-11 matrix entry.
Added DIS/HLA compile-only job (catches API drift without requiring a live network).
Real --coverage flag wired (previously passed a stray literal string).
Added macOS Debug configuration.
Removed broken nightly fuzz and ABI jobs that had been failing silently.
Pinned codeql-action to v3.
sol2 pinned to v3.3.0 (Lua binding stability).
pugixml find_dependency added (fixes downstream CMake consumers).
Transport library installed alongside engine library.

Other¶

build_test/ and claudedocs/ removed from git index; .gitignore hardened.
CPack macOS asset guards added.
Telemetry module wired into the build system (not yet into the engine loop — see Known Limitations).
Real benchmark suite added: BM_EngineStep (N-entity scaling), integrator comparison, coordinate transforms, environment queries, DIS codec, and thread pool throughput — replacing the two placeholder cases.
README features matrix split into Production and Experimental / Roadmap sections for honest capability reporting.

Phase 1 — Stability Fixes¶

Per-entity integrator state (cross-entity corruption fix)¶

Before v0.7.0: All integrators (Verlet, ABM4, DormandPrince, Symplectic, Boris) held their history and scratch state in a single shared object. A simulation with multiple entities would overwrite each other's multi-step history on every tick. The ABM4 corrector had no meaningful velocity correction at all — it was running a first-order position predictor only.

After v0.7.0: The IStatePropagator interface gains a fourth argument:

virtual void integrate(
    EntityState& state,
    const EntityForces& forces,
    Real dt,
    IIntegratorEntityState* entity_state) = 0;   // per-entity context

Each entity carries its own IIntegratorEntityState opaque handle. The engine passes this handle on every call, so integrator history is per-entity. ABM4 is upgraded to a consistent Adams-Bashforth 4^th-order position predictor.

The legacy 3-argument integrate path is preserved for callers that manage state externally or use single-entity scenarios.

NaN / Inf containment¶

PhysicsSystem now checks every output state after integration. If any component is NaN or Inf:

The entity is rolled back to its last known-good state.
A warn-once message is emitted identifying the entity and tick.
Simulation continues — no crash, no silent corruption.

Event dispatcher use-after-free fix¶

The previous dispatcher iterated directly over the live handler list while dispatching. A handler that called unsubscribe on itself or another handler would invalidate the iterator, producing undefined behaviour.

Fix: The dispatcher takes a snapshot of the handler list under a lock before iterating. The snapshot is a value copy; mutations during dispatch are queued and applied after the snapshot iteration completes.

Documented semantics: A handler that is unsubscribed mid-dispatch may still receive the current event (it was already in the snapshot). It will not receive subsequent events.

Federation orchestrator self-deadlock¶

FederationOrchestrator::get_federation_locked called internal helpers that also attempted to acquire the same federation mutex, producing a self-deadlock under certain membership query paths.

Fix: Internal helpers use an internal unlocked version; the public path acquires the lock exactly once.

DisSocket race and multicast deadlock¶

Move constructor race: The DisSocket move constructor transferred socket file descriptors without memory-ordering guarantees. Fixed with sequentially-consistent atomic stores on the descriptor field.
close() multicast deadlock: close() called leave_multicast_group while holding the socket mutex; leave_multicast_group also tried to acquire the same mutex. Fixed by separating the group-leave logic into an unlocked internal helper called before lock acquisition.

DIS networking: real UDP transport¶

DisSocket now implements real UDP send/receive via POSIX sockets. Loopback mode (in-process simulation without a network) is preserved and selected automatically when no network interface is configured. Received PDU length is now authoritative — the PDU header's declared length field is validated but never trusted to determine how many bytes to hand to the decoder (prevents buffer over-read on malformed packets).

Thread pool rewrite (six bug classes eliminated)¶

The previous thread pool had the following documented races:

Bug class	Description
UAF on completion state	`std::promise` / result storage outlived by outstanding futures
Lost wakeup	Condition variable notified outside mutex, allowing a thread to miss it
100 µs busy-poll	`wait_all` spun on a 100 µs sleep instead of waiting on a condition
Exception non-propagation	Exceptions in tasks were swallowed; callers could not detect failure
Unsafe shutdown	Worker threads could access deallocated queue memory during destruction
Non-deterministic stealing	Round-robin order was not enforced; reproducibility relied on timing

All six classes are eliminated in the v0.7.0 rewrite:

Completion state wrapped in shared_ptr — the promise lives as long as any future holds a reference.
enqueue increments a pending-task counter; wait_all waits on a condition variable that signals when the counter reaches zero (no busy-poll).
Condition variable is always notified under mutex.
Exceptions are captured via std::promise::set_exception and re-thrown on future::get().
Shutdown signals workers to drain before joining; no memory is freed until all workers exit.
Stealing uses deterministic round-robin index so fixed-seed simulations are reproducible.
wait_future / help_one allows a calling thread to participate in draining the queue while waiting for a specific future, preventing deadlock when a parallel_for is nested inside another task.
Nested parallel_for from worker threads is handled inline rather than re-enqueued, preventing deadlock under a fully-loaded pool.
TSan-clean on the full CI matrix.

GPU memory pool deadlocks¶

Two self-deadlocks in the GPU memory pool:

Fragmentation path: The defragmenter called get_stats internally, which re-acquired the pool mutex from the same thread.
get_stats path: get_stats was a public lock-acquiring method called from within a lock-holding context.

Both fixed by splitting internal (unlocked) and external (locked) variants. Global buffer IDs are now assigned atomically. upload and download routes through the pool rather than allocating staging buffers ad-hoc.

Physics kernel fixes¶

AABB broad-phase stamps the entity_id into each bounding volume entry (previously the field was uninitialised).
Terrain contact normal uses Y-up convention (was computing a gravity-aligned normal regardless of terrain slope).
Symplectic integrator no longer double-applies gravity (it was accumulated in both the force stage and the integration kernel).
Hybrid ForceCPU routing flag is honoured (previously ignored, causing all entities to use GPU path even when CPU was requested).

XR fixes¶

Controller state was not populated during xrBeginSession initialisation. As a result, the first several frames after session start returned zero-valued poses. Fixed by populating the controller state map at session init.
Haptic renderer statistics (total_renders, avg_latency_ms) were computed incorrectly due to an off-by-one in the ring buffer read pointer. Fixed.

AdaptiveIntegrator clamp bug¶

A 1-second integration request would be integrated for 0.1 seconds. The interval clamp condition compared the requested dt against max_step using > instead of >=, causing the last sub-step to be clamped to max_step rather than the true remainder.

SGP4 corrections (validated to 6 significant figures)¶

Five numerical errors in the SGP4 implementation were corrected and validated against the Vallado reference implementation and python-sgp4:

Correction	Impact
Velocity XKE factor	Velocity magnitudes off by ~0.01 % at LEO
J3/J2 long-period terms	Argument of perigee drift error in elliptical orbits
Short-period periodics	Position error growing with propagation time
Secular rates	RAAN / argument of perigee secular drift
All five combined	Positions agree to 6 significant figures at 90-minute propagation

SDP4 not implemented

The is_deep_space() path returns true for satellites with mean motion < 6.4 rev/day (period ≥ 225 minutes), but the SDP4 algorithm is not implemented. Calls to propagate() on deep-space TLEs return false. See Known Limitations.

Von Karman turbulence forming filter¶

The state-space forming filter for Von Karman turbulence was accumulating variance without bound because the filter output was not properly normalised against the driving noise. The restructured filter correctly tracks the target turbulence intensity.

EnvironmentService::query made virtual¶

EnvironmentService::query is now a virtual method, enabling mock subclasses in tests and downstream code that needs deterministic environment responses. Previously the method was non-virtual, blocking mockability.

DIS PDU validation sentinel¶

DisSocket::validate() now accepts the all-zeros PDU as a valid sentinel (used in tests to represent "no PDU received"). Previously the validator rejected it as malformed.

Ballistic endpoint NED gravity convention¶

The ballistic terminal endpoint computation was using an ENU gravity vector. Fixed to NED convention, matching the coordinate frame used by the rest of the force-accumulation stage.

Phase 2 — Performance¶

Dense EntityId-sorted active list¶

The previous physics tick traversed three separate unordered_map containers. Every step involved hash lookups and non-deterministic iteration order. v0.7.0 replaces these with a single dense std::vector sorted by EntityId. The sorted order also makes simulation results reproducible under a fixed seed.

Per-tick allocation and lock elimination¶

Allocation removed	Replacement
Per-tick force generator list per entity	Cached per-entity generator list; rebuilt only on entity config change
Global generator list rebuilt each tick	Persistent cached list; invalidated only on registration change
Scratch buffers allocated in force stage	Persistent scratch buffers allocated at system init

A step-window assertion verifies the cache is not used beyond its valid window. No heap allocations in the hot path.

Geodetic computation: 1x per entity per tick¶

Bowring's iterative geodetic conversion is capped at 2 iterations (error < 2 × 10⁻⁹ m vs. the full convergence reference). LLA is computed once per entity per tick and reused across all consumers in that tick. An environment-query overload accepts pre-computed LLA directly, avoiding an unnecessary ECEF round-trip. Land entities reuse the terrain query result from the force stage in the constraint stage — no duplicate queries.

Inverse-inertia cache¶

The inverse inertia tensor is no longer invalidated when entity state is loaded from a snapshot. It is only invalidated when the inertia tensor itself changes.

Lock-free terrain fast path¶

TerrainManager now has a lock-free fast path when no terrain is loaded. The common no-terrain case (training scenarios without GDAL data) does not acquire any mutex.

Parallel force stage¶

Force accumulation is parallelised via ThreadPool::parallel_for when entity count exceeds 64. Below this threshold, serial execution is used (avoids thread overhead for small populations). The parallel and serial paths produce bit-identical results. TSan reports no data races.

Measured performance (Apple M3 Max, Release build)¶

Scenario	Time per step
BM_EngineStep / 100 entities	0.495 ms
BM_EngineStep / 1 000 entities	0.636 ms
BM_EngineStep / 10 000 entities	4.889 ms

Infrastructure — Test Suite¶

2213 / 2213 green¶

All 2213 registered tests pass. This includes:

The ~30 latent tests that were never previously compiled, now fixed and registered.
Full release-mode build with JAGUAR_BUILD_TESTS=ON, JAGUAR_ENABLE_DIS=ON, JAGUAR_ENABLE_HLA=ON.

Frozen-force interface tolerance recalibration¶

Several integration tests check that force generators produce deterministic output when called with a frozen (constant) force input over multiple ticks. With the new parallel force stage, all forces in a tick are accumulated before integration runs. For ABM4 specifically, the corrector velocity step in a frozen-force scenario now uses the predictor velocity rather than a lagged cached value.

The practical effect: tests that expected exact floating-point equality between serial and parallel ABM4 paths under frozen forces required tolerance adjustment. The adjustment is mathematically justified — floating-point addition is not associative, and parallel reduction changes summation order. The physical prediction quality is unchanged.

Known Limitations¶

The following are intentional stubs or incomplete implementations:

Area	Status	Notes
SDP4 deep-space propagation	Not implemented	`is_deep_space()` returns `true`; `propagate()` returns `false` for deep-space TLEs. SGP4 near-Earth path is production-grade.
ABM4 velocity corrector under frozen-force interface	First-order under this interface	The Adams-Moulton corrector for velocity requires a force evaluation at the predicted state, which is not possible through the frozen-force interface. Position is fourth-order; velocity is first-order in this regime.
ML inference	Stub	Mock pointer only; no ONNX Runtime linked. Feature matrix: `maturity: stub`.
OpenXR	Interface stub	Mock runtime for tests; not wired to a real HMD. Feature matrix: `maturity: stub`.
HLA vendor RTI	In-memory RTI only	No vendor RTI library (Pitch, MAK, Portico) linked. Feature matrix: `maturity: stub`.
Kubernetes autoscaler	Stub	No real K8s client. Feature matrix: `maturity: stub`.
GPU backends (CUDA / Metal / OpenCL)	Build-gated, not in CI	Off by default. CPU backend is functional and tested. None of the backends are wired into the engine loop. Feature matrix: `maturity: partial`.
Telemetry	Built, not wired into engine loop	Module compiles and links; no data is emitted from production engine paths. Feature matrix: `maturity: functional, wired_into_engine: false`.