Methodology — White Llama

Methodology · how this model works and how it's measured

What this is

A decision-support system for PGA Tour betting. It produces probability estimates for matchup and outright markets, compares them to bookmaker prices, and flags bets where the estimated edge clears a configurable threshold. It is not an automated bettor — every recommendation is reviewed manually before any wager is placed.

The model

Three components, run in sequence each week:

Skill model. A Kalman filter over each player's per-round Strokes Gained values, one filter per SG category (Off-the-Tee, Approach, Around-the-Green, Putting). State is a scalar — the player's current true skill in that category. Process noise (Q) and observation noise (R) are tuned per-category against held-out data, not assumed. New players start at the tour mean with high prior variance.
Simulator. Vectorized Monte Carlo over 100,000 tournament iterations. Each iteration draws skill estimates from the posterior, simulates four rounds of SG values per player, applies the cut, and ranks by total. Aggregating across iterations gives the win, top-5, top-10, and top-20 probability for every player.
Edge detection. For each market and price, compare the model's probability to the bookmaker's no-vig probability (computed per market, not globally). Bets above the EV threshold are sized via fractional Kelly, capped at a fixed percentage of bankroll regardless of Kelly recommendation.

Pinnacle is treated as the sharpest reference rather than a betable market — a model that disagrees materially with Pinnacle on a liquid line is more likely wrong than right, and we filter accordingly.

What we measure

Closing Line Value (CLV) — primary metric. For every recommended bet, the price taken vs the closing price at Pinnacle. Positive CLV over a sample of 50+ bets is the cleanest evidence of real edge, because the closing line is the market's most informed estimate. Live track record is on the main dashboard; current 95% CIs are honest about sample size.
Calibration. A model that says 60% should hit at 60% — not 50%, not 70%. The calibration page bins predicted probabilities and plots empirical hit rates against them, with Wilson 95% CIs per bin and a Brier / ECE summary. Phase gate: empirical hit rates within ±3pp of predicted across all buckets.
Realized P&L on settled paper portfolios — a validity check on CLV, not a substitute for it. P&L is noisier than CLV and takes longer to converge.

We deliberately do not optimize for accuracy, F1, or AUC. Those are classifier metrics; this is a probabilistic forecaster.

What's not in v1

Player props (birdies, made cuts, three-balls, etc.)
In-play / live-betting markets
Season-long futures
DFS optimization
Bet placement automation — placement is and will remain manual

Honest limits

Sample size. CLV evidence is built one tournament at a time. The early sample favors the model, but confidence intervals will be wide until the track record passes ~50 settled bets per market type.
Course-fit. The system supports a course-fit overlay (per-course XGBoost on player tournament residuals), but in our most recent test it produced lower realized CLV than the skill-only model. The overlay is currently being re-evaluated; live recommendations should be interpreted as primarily skill-driven.
Historical backfill. DataGolf's course-decomposition endpoint only serves the current event, so any backtest must be skill-only by construction. We treat this as acceptable given the result above, but it does cap how far back the formal track record can be reconstructed.
Look-ahead. All training, backtesting, and simulation use only data that would have been available at the decision point. Player skill states are timestamped; queries enforce as_of_date ≤ decision_date. Closing lines are validation targets, never inputs.
Model versioning. Every saved recommendation and simulation result is tagged with the model version that produced it. Behavior changes bump the version — old results are retained for comparison rather than silently re-interpreted.

What you can verify here

Track-record CLV with bootstrap confidence intervals
Reliability diagram with per-bin empirical hit rates
Per-portfolio history — every saved snapshot, the bets it contained, and (once settled) realized outcomes

Model version: 0.1.0. This page describes methodology only — nothing here is investment, betting, or tax advice.