Determinism

What determinism means in batter

The core promise: the same strategy spec plus the same EvidenceDataset always produces the same result_hash, byte-identical across Ubuntu, macOS, and Windows, regardless of which machine runs it or when.

This is not a soft guarantee. Pancake receipts are cited documents — they carry an engine_version and aresult_hash so any reader can independently reproduce the computation by running pip install batter==<version> against the cited dataset and comparing hashes. If the hash can drift between environments, the receipt cannot be cited. Determinism is therefore a hard correctness requirement, not a nice-to-have.

Determinism in batter rests on three pillars:

  1. Canonical JSON serialization — the strategy spec is serialized with keys sorted lexicographically at every nesting depth before hashing. Key reordering is a different spec_hash.
  2. PCG64 seeded RNG — all stochastic operations (bootstrap, permutation test) use NumPy's PCG64 generator with a fixed seed derived from the spec_hash. PCG64 integer draws are byte-stable across operating systems and CPU architectures on Python 3.12+.
  3. Python 3.12+ float stack — the engine requires Python ≥ 3.12 because float accumulation semantics changed in 3.12 in a way that cannot be patched around from Python 3.11. See the investigation below.

The Engine page documents the full batter API and the 12 verified formulas. This page focuses solely on the Python 3.11 exclusion and its root cause.

The 3.11 investigation

During batter 0.4.0 testing, the CI matrix produced different result_hash values on Python 3.11 vs 3.12, with identical NumPy 2.4.6 and identical PCG64 seeds. All four reference fixtures — toy, jakarta_temperature, rapture_family, btc_pred_hedge — produced mismatched hashes.

Root cause: CPython gh-100946

The divergence traces to a single upstream change: CPython issue gh-100946, merged in November 2022 for Python 3.12. That change replaced the C-level implementation of sum() for homogeneous float lists with a compensated accumulation algorithm (Neumaier summation, a variant of Kahan compensated summation described in Higham 2002 §4.3). Python 3.11 and earlier use plain sequential IEEE 754 addition with no compensation.

The difference is visible in one line:

sum([0.01] * 20)
  Python 3.11: 0.20000000000000004  # 0x1.999999999999bp-3
  Python 3.12: 0.2                  # 0x1.999999999999ap-3

One ULP difference in the trailing mantissa bit. The 3.12 result is the correctly rounded value. The 3.11 result is the accumulated rounding error from sequential addition.

Propagation through bootstrap CI

That 1-ULP difference in the mean or variance of a return series propagates into the bootstrap confidence interval computation. The toy fixture Sharpe CI illustrates the effect:

toy fixture — sharpe_ci low
  Python 3.11: -4.328152052015572   # 0x-1.15007176e16b4p+2
  Python 3.12: -4.3281520520155725  # 0x-1.15007176e16b5p+2
  1 ULP difference

Because result_hash is a SHA-256 digest of the full result envelope — including all CI bounds — a 1-ULP shift anywhere in the output changes the hash entirely:

toy fixture — result_hash
  Python 3.11: 47e9266c...
  Python 3.12: dcc56c4d...

Why it cannot be patched

The investigation bisected the problem systematically. Each candidate root cause was eliminated:

  • NumPy version — both runs used NumPy 2.4.6. Identical PCG64 integer draws confirmed the RNG was not the source.
  • math.fma — zero uses in the engine codebase.
  • PCG64 state — byte-identical integer sequences across versions confirmed the RNG was not the source.

math.fsum was tested as a potential portable replacement — it produces a third distinct answer (mantissa suffix e16b6), different from both 3.11 (e16b4) and 3.12 (e16b5). There is no single sum() implementation that produces the Python 3.12 result on Python 3.11.

The root cause is the Python version's C-level float stack management. It cannot be fixed from Python userspace. The only way to get byte-identical output is to run on Python 3.12 or later.

What we ship

Scope statement

Python 3.11 is permanently out of scope for batter 0.4.x. There is no code change that makes Python 3.11 produce byte-identical result_hashvalues to Python 3.12+. The difference is inherent to the Python version's C-level float accumulation. The official requirement:

batter 0.4.x requires Python >= 3.12.
Python 3.11 produces different result_hash values due to a sum()
precision change in CPython 3.12 (gh-100946); byte-identical
determinism is only guaranteed on Python 3.12+.

CI matrix

The batter package CI runs on Ubuntu, macOS, and Windows against Python 3.12 and Python 3.13. The 6-cell matrix (3 OS × 2 Python versions) all produce byte-identical result_hash across all four reference fixtures. Python 3.11 is explicitly excluded from the matrix.

The Pancake production CI (.github/workflows/ci.yml) handles the Next.js frontend; batter's own Python CI matrix lives in the usepancake/batter repository.

Publish policy

batter releases are published to PyPI via OIDC-signed trusted publishing through the batter package release pipeline. Signed provenance is verifiable via the PyPI provenance attestation for each release. The Apache-2.0 license and source are available at usepancake/batter.

References

  • CPython gh-100946 — Python 3.12 float accumulation change (merged Nov 2022).
  • Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms, 2nd ed. SIAM. §4.3 (compensated summation).
  • Neumaier, A. (1974). “Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Gleitkommazahlen.” ZAMM, 54(1), 39–51.

See also