What metrics does Pancake report on a backtest receipt?

Every Pancake receipt reports: total return, CAGR, Sharpe ratio, Sortino ratio, maximum drawdown, win rate, Wilson CI on win rate, Brier crowd score, bootstrap CI on Sharpe, and permutation test p-value. Metrics requiring at least 10 trades are suppressed when trade count is below that threshold.

Pancake receipts report 10 metrics derived from the batter execution engine. Each metric has a citable academic source and a deterministic formula — the Glossary documents all 10.

Total return (sum of realized P&L) and win rate (fraction of winning trades) are always reported regardless of trade count. The remaining eight metrics require N ≥ 10 trades; below that threshold they are labeled "insufficient_data (N=X < 10)" — not blank or zero — so the agent reads the honest framing.

CAGR (Bacon 2008) annualizes the return over the backtest window. Sharpe ratio (Sharpe 1994) measures excess return per unit of total volatility, annualized with √252 and Bessel correction. Sortino ratio (Sortino & Price 1994) uses only downside deviation. Maximum drawdown is the largest peak-to-trough decline.

Wilson CI (Wilson 1927) provides the 95% confidence interval on win rate — more accurate than the normal approximation at small N. Brier crowd score (Brier 1950) measures mean-squared forecast error vs binary outcomes. Bootstrap CI (Efron 1979) uses 10,000 resamples with PCG64 seeded from the result_hash. The permutation test (Good 2005) runs 1,000 shuffles to produce a p-value for the null that observed Sharpe is due to chance.

Related