Intraday currency markets are not one environment but several, stitched together across the day. The Asian session tends to be quieter and range-bound; the London open brings a surge of volatility and direction; the New York session and the London–New York overlap behave differently again. A model evaluated on a window short enough to sit inside one of these sessions can look excellent for a reason that has nothing to do with having found a durable edge — it has simply fitted the character of that one session. This is session overfitting, and it is one of the quieter ways a backtest can flatter a model.

How a Short Window Misleads

Suppose a model is evaluated on an 8-hour window that happens to fall largely within a calm, range-bound stretch. A strategy that fades small moves back toward the mean will thrive there. Its win rate will be high, its drawdown shallow, its equity curve smooth. Everything about the backtest will say this is a strong model.

The problem is that the window never tested the model against anything else. When volatility expands at a session open, the same mean-reversion logic that looked so clean starts fighting strong directional moves, and the shallow drawdown becomes a deep one. The model was never bad at trading; it was only ever measured against the one regime it happened to suit. The short window did not reveal that — it concealed it.

This is distinct from classic parameter overfitting, where a model is tuned to the noise of a dataset. Session overfitting can happen even with sensible parameters, simply because the evaluation period was too narrow to represent the range of conditions the model will actually face once live.

Why a Longer Window Helps

The remedy is to evaluate over a window long enough to contain the variety the market actually contains. darwintIQ recently extended its live-model evaluation window to 40 hours of one-minute data, up from 8. A 40-hour window spans multiple trading sessions and the transitions between them, so a model is now scored across quiet ranges, volatile opens, and the handoffs in between rather than against a single slice of the day.

The effect on the search is direct. A model that only works in calm conditions can no longer post a top score by being measured solely in calm conditions — its struggles during the volatile portions of the window now pull its results down. To rank well, a model has to hold together across the mix. That is a far better proxy for how it will behave in live trading, where it does not get to choose which session it operates in.

A longer window also dilutes simple luck. In a short window, a couple of fortunate trades can carry the whole result; across 40 hours and a larger trade count, the law of averages has more room to work and idiosyncratic runs matter less.

It Is Not Just About Length

Extending the window is necessary but not sufficient. A longer window still has to be paired with thresholds that make sense at the new scale — minimum trade counts and other gates rescaled so they reflect the larger sample — and with a holdout that checks performance on the most recent, unseen portion of that window. Length gives the model more conditions to prove itself against; the holdout checks that it proved itself on data it could not have been tuned to. Together they target session overfitting from both sides: more variety in the test, and a clean out-of-sample check on the tail.

Final Thoughts

A backtest measures a model against whatever happened during its window, and nothing else. If that window sits inside a single market mood, a strong-looking result may say more about the mood than the model. Spanning multiple sessions makes the evaluation harder in exactly the way that matters — it forces a model to demonstrate that its edge survives the shifts between quiet and volatile conditions that define the real trading day. When you see a model that holds up over a long, multi-session window, you are seeing something closer to durability than to a flattering afternoon.

Session Overfitting — Why a Short Backtest Window Flatters a Model

A model tuned on a single quiet London afternoon will struggle the moment New York opens.

How a Short Window Misleads

Why a Longer Window Helps

It Is Not Just About Length

Final Thoughts

Latest in Market Behaviour

Related Articles

Related Articles

Why Simple Trading Models Often Outperform Complex Ones

Validating Live Models on Unseen Data — The Out-of-Sample Holdout in darwintIQ

How Many Trades Before You Trust a Model? Statistical Significance in Trading

Overfitting in Trading Models — Why a Perfect Backtest Is a Warning Sign