Method. For each historical PGA event 2019–2025, run the recommender as it would have run that week (Kalman skill state computed only from rounds before the tournament — no look-ahead), generate matchup bets at the captured pre-tournament prices, and compare each bet's price to the captured Pinnacle close. Positive CLV across this sample means the model consistently took prices the market subsequently moved toward.
- Skill-only. Course-fit overlay is unavailable historically (DG decomposition endpoint serves only the current event). The course-fit experiment we did run separately produced lower CLV than skill-only, so this likely isn't a material understatement.
- Single-book. Backtest uses the captured pre-tournament price for each matchup; live recs shop across 8–13 books. Live edge can exceed backtest because of best-line selection.
- Recommender has evolved. Current backtest uses the most recent recommender code against historical odds + skill states. Earlier model versions would have produced different recs; this is "would today's recommender have done well historically?", not "what did we historically do?".
- Closing line as ground truth. CLV measures price vs Pinnacle close. Realized P&L was not simulated — that requires actual round-by-round outcomes against bets, which is doable but not what this report covers.