# Benchmarks ```{note} Results on this page were generated with **tessera {{version}}**. The benchmark JSON log records the exact version, platform, and Python version for each run so results can be compared across releases. ``` Build-time benchmarks for constructing CDT simplicial complexes across dimensions 1D through 4D and a range of target sizes. All timings are wall-clock averages over 5 repetitions on a single core, measured with `time.perf_counter()`. Run the benchmark script yourself to get numbers for your hardware: ```bash python examples/benchmarks/build_benchmark.py --save docs/source/assets/benchmarks/ ``` The script writes a structured JSON log alongside the plots (`benchmark_results.json`) for programmatic comparison across versions. --- ## Build Time vs. Complex Size Build time scales roughly linearly with the number of target simplices in every dimension. Higher-dimensional complexes are more expensive per simplex because each $d$-simplex carries $\binom{d+1}{2}$ edges and $d + 1$ vertices that must be linked into the triangulation. ```{image} assets/benchmarks/build_time.png :alt: Log-log plot of build time vs. target simplices for 1D-4D :width: 80% :align: center ``` --- ## Build Throughput Throughput (simplices per second) is highest in 1D--2D (~200,000--500,000 simpl/s) and decreases with dimension as the per-simplex bookkeeping grows. 4D sustains ~150,000 simpl/s at 100k simplices, meaning a 100k-simplex triangulation builds in under 1 second. ```{image} assets/benchmarks/build_throughput.png :alt: Bar chart of build throughput by dimension and target size :width: 80% :align: center ``` --- ## Full Dashboard The four-panel view combines build time, throughput, actual-vs-requested size, and complex density (vertices and edges per simplex) across all dimensions. ```{note} 1D CDT produces far fewer simplices than requested because the triangulation is topologically a circle and the builder converges once the minimal cell structure is complete. The 1D data points are included for completeness but are not directly comparable to 2D--4D. ``` ```{image} assets/benchmarks/build_benchmarks.png :alt: Four-panel benchmark dashboard :width: 100% :align: center ``` --- ## Results Table ```{list-table} Build times (5-run average, v0.1.0) :header-rows: 1 :widths: 8 15 15 15 15 15 * - Dim - Target - Actual - Vertices - Edges - Time (s) * - 2D - 500 - 498 - 252 - 750 - 0.001 * - 2D - 10,000 - 9,996 - 5,001 - 14,997 - 0.038 * - 2D - 100,000 - 99,996 - 50,001 - 149,997 - 0.509 * - 3D - 500 - 492 - 168 - 662 - 0.002 * - 3D - 10,000 - 9,996 - 3,336 - 13,334 - 0.043 * - 3D - 100,000 - 99,996 - 33,336 - 133,334 - 0.580 * - 4D - 500 - 500 - 130 - 635 - 0.003 * - 4D - 10,000 - 10,000 - 2,505 - 12,510 - 0.058 * - 4D - 100,000 - 100,000 - 25,005 - 125,010 - 0.684 ``` --- ## Optimization History The chart below compares the original unoptimized build times against the current v{{version}} code. The cumulative optimizations include: - **Fingerprint kMax 64 → 8** — eliminated ~448 bytes of dead array per Fingerprint, reclaiming ~214 MB on a 100k lattice and dramatically improving cache utilization. - **Per-simplex edges and cofaces: `unordered_set` → `vector`** — replaced hash tables with flat vectors for collections of 3-10 (edges) and 0-2 (cofaces) elements, eliminating per-object hash table overhead. - **Inlined trivial accessors** — moved `isTimelike()`, `size()`, `getTi()`, `getTf()`, `getOrientation()` from Simplex.cpp to the header so the compiler can inline them across translation units. - **Return by const reference** — `getVertices()`, `getEdges()`, `getCofaces()`, `getSimplices()`, `getSource()`, `getTarget()` no longer copy containers or bump refcounts on every call. - **O(1) simplex unregistration** — replaced linear scan with an index map. - **Direct hash in `createSimplex()`** — eliminated temporary `Fingerprint` allocation by computing the XOR hash inline and using heterogeneous lookup. - **CDT move locals** — replaced `unordered_set` with `vector` for small per-move vertex/simplex collections. - **Vertex `inEdges`/`outEdges`/`simplices`: `unordered_set` → `vector`** — same flat-vector optimization applied to per-vertex containers (5-20 elements). - **`shared_ptr` → raw pointers** — replaced `shared_ptr`, `shared_ptr`, `shared_ptr` with raw pointers throughout. Ownership is now explicit via `unique_ptr` in the owning containers (Spacetime, EdgeList, VertexList). Eliminates atomic refcount overhead on every pointer copy in the sweep inner loop. - **Eliminated dead return values** — `removeInEdge`/`removeOutEdge` were allocating a hash table of "owner" simplices on every call and returning it to nobody. - **Merged redundant hash tables** — consolidated `simplexVecIndex` and `simplexPoolIndex_` into a single `simplexIndex_`, halving hash lookups in `registerSimplex`/`unregisterSimplex`. - **Open-addressing flat hash map** — replaced `std::unordered_map` (chained hashing) with a custom `FlatHashMap` using identity hash and linear probing for the two hot-path simplex index tables. ~3x better cache locality on lookup-heavy workloads. - **Eliminated `liveIndex_` maps** — stored the live-vector index directly on `Edge` and `Vertex` objects, removing two `unordered_map`s (~5k entries each) and their associated hash lookups on every add/remove. - **Inlined `computeDeltaAction` and `accept`** — moved the Metropolis acceptance test and incremental action computation to the header, eliminating function call overhead on every move attempt (~100k+/sec). - **Stack-allocated move temporaries** — replaced heap-allocated `std::vector` with a fixed-capacity `StackVec` for vertex/simplex lists in flip, iflip, and shift moves, eliminating `malloc`/`free` on every rejected move attempt. Cumulative build-time improvement is **50-67%** at large lattice sizes, with sweep performance improving by up to **1.7x** (7.7 → 4.6 µs/move at N4=19k). ```{image} assets/benchmarks/benchmark_comparison.png :alt: Before vs. after benchmark comparison :width: 100% :align: center ``` To regenerate this comparison after further changes: ```bash # Save a baseline python examples/benchmarks/build_benchmark.py --save /tmp/baseline/ # ... make changes, rebuild ... # Save and compare python examples/benchmarks/build_benchmark.py --save /tmp/new/ python examples/benchmarks/compare_benchmarks.py \ --before /tmp/baseline/benchmark_results.json \ --after /tmp/new/benchmark_results.json \ --save docs/source/assets/benchmarks/ ``` --- ## Running the Benchmark ```bash # Default: 5 sizes x 4 dimensions x 5 repeats (~1 minute) python examples/benchmarks/build_benchmark.py --save docs/source/assets/benchmarks/ # Quick check (fewer repeats) python examples/benchmarks/build_benchmark.py --repeats 2 --save /tmp/bench/ ``` The `--save` directory receives: | File | Contents | |:-----|:---------| | `benchmark_results.json` | Structured log with metadata, per-trial timings, and summary statistics | | `build_benchmarks.png` | Four-panel dashboard | | `build_time.png` | Build time vs. size (standalone) | | `build_throughput.png` | Throughput bar chart (standalone) |