About our testing engine

Minerva is the most rigorous backtester on the market. Period.

Deutsche Bank’s quant team catalogued the deadly sins of backtesting. Minerva exists so they can’t happen to you.

An optimizer just found a strategy with an in-sample Sharpe of 3.4

Watch what usually happens next.

in-sample · what the optimizer sawout-of-sample · what comes nextthe discoverywhat the backtest promiseswhat usually happens

The gap is overfitting. Most backtesters never mention it. Follow the spine — six causes, one honest machine.

Your other backtests are lying to you.

Six lies down the spine. The seal at the bottom.

01
Sin № 1 · Survivorship bias

The dead were deleted.

3,000< 500

Most platforms test on today’s index — every bankruptcy quietly erased. That’s survivorship bias, and it can flip a conclusion outright. Minerva trades the market as it was: point-in-time membership, dead companies included.

Red: the market with its losers deleted. Ink: the market that happened.
02
Sin № 2 · Look-ahead bias

Your strategy read tomorrow’s paper.

1.410.26

80% of the “edge” was time travel — look-ahead bias. In Minerva, signals can only fill at the next bar’s open, and a model can’t even see the bar it’s standing on.

The same discipline covers corporate actions: splits and dividends come from a verified, versioned reference — if a symbol’s record is missing, the run fails loudly rather than simulating on adjusted prices that secretly contain the future. Adjusted and raw series live under separate identities, so they can never mix.

A signal born on this bar fills at the next bar’s open. The clock only runs forward.
03
Sin № 3 · Storytelling

Every squiggle gets a story.

The same strategy swings from Sharpe −1.8 to +2.2 depending on which two years you show — both with great stories. Minerva ignores the story and grades consistency: every period testifies, including the worst one.

Eight market periods, graded separately. The red ones count too.
Interlude

An optimization is 4,000 backtests.

The best of 4,000 coin-flippers looks like a genius — ≈4σ of pure luck. So Minerva treats optimization as a statistics problem, not a leaderboard.

04
Sin № 4 · Data mining

Your best test was likely pure luck.

4,0001

In-sample champions are what tournaments produce, even among coin-flippers. Minerva’s answer is Combinatorial Purged Cross-Validation: every candidate is graded only on data it never saw — every train/test combination of history, with quarantine gaps at each cut — then the score is deflated by the size of the search. One survives the gates, or none does.

Combinatorial Purged Cross-Validation
■ train · test · ▨ quarantinecombination 1/15
Every train/test combination · 0 of 5 out-of-sample paths assembled.
05
Sin № 5 · Costs & turnover

Backtests trade for free. You won’t.

+12%−1%

Per year, at 30 bps per trade. Minerva bills every fill like a broker — spread, market impact, commissions — by default.

cost per trade · 0 bps · “genius”
06
Sin № 6 · Outliers

One bad print mints a fortune.

$60,000rejected

One misprinted EPS — on a $20 stock — once spiked an entire index. Minerva rejects impossible data at the gate and never peeks forward to normalize.

Impossible values bounce. Metrics only ever see clean returns.
The receipt · vs. QuantConnect

We checked our math against theirs.

Same public strategy, same settings, five years of AMD: agreement to $15.53 on $250,000 — every entry date identical. Our default is then deliberately stricter: models never see the current bar. Slightly humbler numbers. Numbers you can trust.

measureQCMinervaΔ
trades3333exact
entries33/33exact
final equity$250,427.60$250,412.07−$15.53
fees$395.89$395.37−$0.52
The obvious question

Why doesn’t everyone test this way?

Because honesty is computationally brutal — thousands of full simulations per verdict. Minerva compiles the entire simulation to machine code and fans it across hundreds of cloud cores — built by physics and math PhDs from Berkeley and the Sorbonne. What lived inside quant funds is now a button.

Backtests are hypothetical and depend on data quality, assumptions, fees, slippage, and market regime. Past performance does not predict future results.Risk disclosureMarket data
Minerva
robustness
seal

Five gates. All of them. Or no seal.

Every optimization ends in a verdict you can read — which gates passed, which didn’t, and by how much. Here’s what each gate actually asks. The seal sits atop a full P&L report; here’s how to read it.

DSR ≥ 0.95 · deflated Sharpe ratio

Your Sharpe, after subtracting what the luckiest of your N attempts would have scored by pure chance. If the edge doesn’t clearly beat luck, it isn’t an edge.

PBO ≤ 0.50 · probability of backtest overfitting

Across every way of splitting history, how often does your in-sample champion flop out of sample? More often than a coin flip means the win was memorized, not learned.

SPA p ≤ 0.10 · superior predictive ability

Could a family of zero-skill strategies plausibly have produced your best result? This p-value answers it — accounting for the entire search, not just the winner.

Track ≥ MinBTL · minimum backtest length

The more you searched, the more history you owe. A short track record with a big search is an automatic fail — the math says so before the market does.

Regime ≥ 0.60 · stability across periods

Profitable in most market periods, judged on the worst one too. One lucky era plus a good story doesn’t pass.

Run a sealed optimization →