About our testing engine

Minerva built one of the most rigorous backtesters available.

Deutsche Bank’s quant team cataloged the deadly sins of backtesting. Minerva exists so they can’t happen to you.

Scroll for an overview. Read detailed documentation here.

Another tester just told you your strategy has a Sharpe of 3.5

Watch what usually happens next.

The gap is overfitting. Most backtesters never mention it. Scroll to learn more.

Your other backtests are lying to you.

Sin № 1 · Survivorship bias

The dead were deleted.

3,000→< 500

Most platforms test on today’s index — every bankruptcy quietly erased. That’s survivorship bias, and it can flip a conclusion outright. Minerva trades the market as it was: point-in-time membership, dead companies included.

Red: the market with its losers deleted. Ink: the market that happened.

Sin № 2 · Look-ahead bias

Your strategy read tomorrow’s paper.

1.41→0.26

80% of the “edge” was time travel — look-ahead bias. In Minerva, signals can only fill at the next bar’s open, and a model can’t even see the bar it’s standing on.

The same discipline covers corporate actions: splits and dividends are accounted for, and the past and future never mix.

A signal born on this bar fills at the next bar’s open. The clock only runs forward.

Sin № 3 · Storytelling

Every squiggle gets a story.

The same strategy swings from Sharpe −1.8 to +2.2 depending on which two years you show — both with great stories. Minerva ignores the story and grades consistency: every period testifies, including the worst one.

Eight market periods, graded separately. The red ones count too.

Interlude

An optimization is typically ~40,000 backtests.

The best of 40,000 coin-flippers looks like a genius — ≈4.6σ of pure luck. So Minerva treats optimization as a statistics problem, not a leaderboard.

Sin № 4 · Data mining

Your best test was likely pure luck.

40,000→1

In-sample champions are what tournaments produce, even among coin-flippers. Minerva’s answer is Combinatorial Purged Cross-Validation: every candidate is graded only on data it never saw — every train/test combination of history, with quarantine gaps at each cut — then the score is deflated by the size of the search. One survives the gates, or none does.

Combinatorial Purged Cross-Validation

■ train · ■ test · ▨ quarantinecombination 1/15

Every train/test combination · 0 of 5 out-of-sample paths assembled.

Sin № 5 · Costs & turnover

Trading costs money. We count your fees and slippage.

+12%→−1%

Per year, at 30 bps per trade. Minerva bills every fill like a broker — spread, market impact, commissions — by default.

cost per trade · 0 bps · “genius”

Sin № 6 · Outliers

One bad print mints a fortune.

$60,000→rejected

One misprinted EPS — on a $20 stock — once spiked an entire index. Minerva rejects impossible data at the gate and never peeks forward to normalize.

Impossible values bounce. Metrics only ever see clean returns.

✓

The receipt · vs. QuantConnect

We checked our math against theirs.

Same strategy, same settings, five years of AMD: agreement to $15.53 on $250,000. The discrepancy? Minerva’s default is deliberately stricter: our signals never see the current bar, while theirs do.

measure	QC	Minerva	Δ
trades	33	33	exact
entries	33	33	exact
final equity	$250,427.60	$250,412.07	−$15.53
fees	$395.89	$395.37	−$0.52

The obvious question

Why doesn’t everyone test this way?

Because honesty is computationally brutal — thousands of full simulations per verdict. Minerva compiles the entire simulation to machine code and fans it across hundreds of cloud cores — built by physics and math PhDs from Berkeley and the Sorbonne. What lived inside quant funds is now a button.

Backtests are hypothetical and depend on data quality, assumptions, fees, slippage, and market regime. Past performance does not predict future results.Risk disclosure Market data

Minerva
robustness
seal

Five gates. All of them. Or no seal.

Every optimization ends in a verdict you can read — which gates passed, which didn’t, and by how much. Here’s what each gate actually asks. The seal sits atop a full P&L report; here’s how to read it.

DSR ≥ 0.95 · deflated Sharpe ratio

Your Sharpe, after subtracting what the luckiest of your N attempts would have scored by pure chance. If the edge doesn’t clearly beat luck, it isn’t an edge.

PBO ≤ 0.50 · probability of backtest overfitting

Across every way of splitting history, how often does your in-sample champion flop out of sample? More often than a coin flip means the win was memorized, not learned.

SPA p ≤ 0.10 · superior predictive ability

Could a family of zero-skill strategies plausibly have produced your best result? This p-value answers it — accounting for the entire search, not just the winner.

Track ≥ MinTRL · minimum track record length

A weak Sharpe on a short record cannot be told apart from zero. The thinner the edge — and the more negative the skew — the more history you owe.

Regime ≥ 0.60 · stability across periods

Profitable in most market periods, judged on the worst one too. One lucky era plus a good story doesn’t pass.

Run a sealed optimization →