Deutsche Bank’s quant team catalogued the deadly sins of backtesting. Minerva exists so they can’t happen to you.
An optimizer just found a strategy with an in-sample Sharpe of 3.4
Watch what usually happens next.
The gap is overfitting. Most backtesters never mention it. Follow the spine — six causes, one honest machine.
Your other backtests are lying to you.
Six lies down the spine. The seal at the bottom.
Most platforms test on today’s index — every bankruptcy quietly erased. That’s survivorship bias, and it can flip a conclusion outright. Minerva trades the market as it was: point-in-time membership, dead companies included.
80% of the “edge” was time travel — look-ahead bias. In Minerva, signals can only fill at the next bar’s open, and a model can’t even see the bar it’s standing on.
The same discipline covers corporate actions: splits and dividends come from a verified, versioned reference — if a symbol’s record is missing, the run fails loudly rather than simulating on adjusted prices that secretly contain the future. Adjusted and raw series live under separate identities, so they can never mix.
The same strategy swings from Sharpe −1.8 to +2.2 depending on which two years you show — both with great stories. Minerva ignores the story and grades consistency: every period testifies, including the worst one.
The best of 4,000 coin-flippers looks like a genius — ≈4σ of pure luck. So Minerva treats optimization as a statistics problem, not a leaderboard.
In-sample champions are what tournaments produce, even among coin-flippers. Minerva’s answer is Combinatorial Purged Cross-Validation: every candidate is graded only on data it never saw — every train/test combination of history, with quarantine gaps at each cut — then the score is deflated by the size of the search. One survives the gates, or none does.
Per year, at 30 bps per trade. Minerva bills every fill like a broker — spread, market impact, commissions — by default.
One misprinted EPS — on a $20 stock — once spiked an entire index. Minerva rejects impossible data at the gate and never peeks forward to normalize.
Same public strategy, same settings, five years of AMD: agreement to $15.53 on $250,000 — every entry date identical. Our default is then deliberately stricter: models never see the current bar. Slightly humbler numbers. Numbers you can trust.
| measure | QC | Minerva | Δ |
|---|---|---|---|
| trades | 33 | 33 | exact |
| entries | — | 33/33 | exact |
| final equity | $250,427.60 | $250,412.07 | −$15.53 |
| fees | $395.89 | $395.37 | −$0.52 |
Because honesty is computationally brutal — thousands of full simulations per verdict. Minerva compiles the entire simulation to machine code and fans it across hundreds of cloud cores — built by physics and math PhDs from Berkeley and the Sorbonne. What lived inside quant funds is now a button.
Every optimization ends in a verdict you can read — which gates passed, which didn’t, and by how much. Here’s what each gate actually asks. The seal sits atop a full P&L report; here’s how to read it.
Your Sharpe, after subtracting what the luckiest of your N attempts would have scored by pure chance. If the edge doesn’t clearly beat luck, it isn’t an edge.
Across every way of splitting history, how often does your in-sample champion flop out of sample? More often than a coin flip means the win was memorized, not learned.
Could a family of zero-skill strategies plausibly have produced your best result? This p-value answers it — accounting for the entire search, not just the winner.
The more you searched, the more history you owe. A short track record with a big search is an automatic fail — the math says so before the market does.
Profitable in most market periods, judged on the worst one too. One lucky era plus a good story doesn’t pass.