How does Pancake verify backtests?

Pancake runs a deterministic execution engine (batter) against your evidence dataset and emits a receipt with a SHA-256 result hash, an explicit verification boundary, and a list of unmodeled risks. The engine re-derives all math from first principles — it does not trust numbers the agent asserts.

Verification in Pancake is structured as a 3-tuple: what the engine verified, what it accepted as agent-supplied evidence, and what it did not model. This breakdown appears verbatim in every receipt so a reader can judge the result against the right standard.

The verified layer covers two categories. First, structural invariants: schema shape, lookahead prevention (decision_time < resolution_time), monotonicity, value ranges, and required column presence. Any failure aborts the run — no partial receipt is issued. Second, runner math: the engine re-computes the full P&L ledger, applies declared fees and slippage symmetrically, and produces Sharpe, Sortino, CAGR, Wilson CI, Brier score, bootstrap CI, and permutation test p-values using open-source formulas with citable sources.

The agent-supplied evidence layer names what the agent provided that the engine cannot independently re-derive: feature column values, entry price source, and liquidity source. These are accepted as declared and surfaced verbatim in the receipt.

The result_hash is a SHA-256 digest of the canonical execution output. Any reader with the strategy spec and dataset can re-run batter at the same engine_version and verify the hash matches. The batter engine is open-source at github.com/usepancake/batter.

Related