It’s the 4th of July weekend. My wife is running errands this afternoon and my youngest son is napping. I can’t go outside for a run – it’s July in Texas (i.e., incredibly hot) and, what if the baby wakes up? The only ways to spend my holiday afternoon was either to a) watch TV or 2) dive into Michael Halls-Moore’s recent article, “Series Correlation in Time Series“.

I’m going to skip the math and keep this very layman friendly. There are some key take aways that even the most mathematically challenged traders can learn.

## The Layman’s Guide to Correlation

Let’s go easy on the mathiness and remind ourselves what correlation is. **Correlation is directly related to the angle between two points**.

It’s easiest to understand in two dimensions because that’s what fits on a piece of paper or computer screen. The points (1,0) and (2,0) share an angle of 0°. Basically, the points lie on the same line.

Correlation is the cosine of the angle between two points. Get out a calculator and try it. The cosine of 0° is 1.

cos(0) = 1.0, which is 100% correlation.

Look at the image above. The points (1,0) and (2,0) share the same line. Doesn’t it make sense that points on the same line are 100% correlated? Points that are less than 100% correlated lie on different lines.

To keep this intuitive, you probably know from trading that a 60% correlation is worth paying attention to, but it’s not very predictive. What does that mean geometrically?

We want to get the angle where cos⁻¹(θ) = 0.6. Please don’t get caught up on the cosine stuff. I just want you to realize how this relates to angles.

Those lines are pretty far apart. A 60% correlation between two forex pairs puts you in the same ballpark, but you’re hardly tracking pip for pip.

Forex prices are like points on the graphs above. 1 bar ago, 2 bars ago… 16,000 bars ago the prices of EURUSD were 1.11342, 1.11297…. 1.31974. The historical data on EURUSD forms a point in space. It’s 16,000 dimensional space, which is completely incomprehensible mentally, but it’s still a point. That means you can draw a line to it!

We usually talk about correlations in forex by comparing two currency pairs. Most traders know that EUR/JPY and GBP/JPY are correlated. You’re drawing a line between 0 and the EURJPY point, then drawing a line between 0 and the GBPJPY point, then calculating the angle to determine the correlation.

The EURUSD example only lets us draw one line. What if we compare the EURUSD against itself with a lag? That would let us draw two EURUSD lines!

## The memory function

Using the EURUSD example, you naturally expect a 100% correlation if you drew the same line twice. It’s the same series of prices. That on its own doesn’t tell us too much.

The idea of autocorrelation is to lag the prices. Right now is 15:00. I can take the prices from 15:00, 14:00 and 13:00 and compare it with a one hour delay. The new delayed series are the prices from 14:00, 13:00 and 12:00. A one hour gap is 99.9% correlated with the original price series.

Michael’s article encouraged me to look at bigger and bigger time lags.

The number on the horizontal axis is the time lag, done in groups of 100. The vertical axis shows the correlation.

EURUSD loses all its information after around 3,000 hours of data. That’s about the point where the correlation function reaches 0%.

Autocorrelation is often called the memory function. Traders can use this to ask themselves the question, “How far back in time can I go and still obtain useful information?” I can tell you with high confidence that if you’re trading one hour charts, it’s useless to consider anything beyond 3,000 hours.

My personal threshold for significance is a 75% correlation. EURUSD maintains that autocorrelation through 800 bars of data, which you can see on the chart if you zoom in closely.

The take-away for traders is that once you go past around 1,300 bars back, your information rapidly becomes less and less valuable.

steve says

Hi Shaun, Thanks for another interesting read. I’m interested to know in your view, what implications this article has on back testing longer periods of data. So on a daily back test is it not important to look at the data from longer than 1300 days ago?

How does this compare to the argument of back testing in different market conditions?

Shaun Overton says

Hey Steve,

Great question! I’m honestly not sure yet, but autocorrelation is my new research topic. I’ve only looked at H1 charts so far. If the behavior remains the same on D1 charts, then yes, you shouldn’t go further than that. Keep in mind that you’re making an assumption. Assumptions are dangerous! You need to double check, but I would expect it to be in that ballpark.

The different market conditions argument is a very good one. You need to see how the strategy behaves in different regimes. It’s perfectly valid to backtest your strategy over 20,000 bars. What’s not valid is referencing the information from 20,000 bars ago to make a current trading decision. It sounds like a paradox, doesn’t it?

Dmitry Tworowski says

Hi Shaun, thank you for your insightful posts – it always gives an interesting view point and some refreshment for traders’ brains. The paradox you have pointed out, may mean that a typical strategy has “short memory”. If we could define a memory function for a strategy, could it be a measure of strategy’s quality?

Shaun Overton says

It certainly sounds like it!

Andrew says

Thank you for the trigonometry lesson! Cosine is adjacent over hypotenuse if I remember correctly?

Anyway, as to your conclusion. How does that , if at all, relate to the length of time one can usefully back test for?

In my experience backtesting and optimising even a M15 strategy for only 6 months, must be thousands of bars) is next to useless. But finding the strategy profitable on a 3 year back test, reaps rewards going forward. But I shudder to think how many bars that would be.

Dmitry Tworowski says

Dynamics profile of each market instrument (currencies, futures, indices, etc.) is changing part time periodically, part time chaotically. I think you could perform detailed study(back test and optimization) to find out how many (N) past bars a strategy should be profitable to give you chance to get profit during next n bars. Actually, you would find the memory function (autocorrelation) for your strategy that you use to trade your favorite instrument

Shaun Overton says

Hi Dmitry,

How would you find the memory function of the trading strategy? Would you use the net profit and loss of each trade to look for lags? The problem is that portfolio-level trades are not sequential. Many are simultaneous.

you could perform detailed study(back test and optimization) to find out how many (N) past bars a strategy should be profitable to give you chance to get profit during next n bars.Can you explain this a bit more? I get the gist of the idea, but not enough to start doing it.

Dmitry Tworowski says

Thank you for the comment. I think the idea could be taken from the concept of the memory function. As to the technical performance, it can be a “sliding”/moving of the back test along time series (OHLC data). Say, you test on N bars with a set of certain strategy parameters, then you find profit for next n bars/ You can gradually increase the value on n trying to find profit values, or an optimal profit/loss ratio. I use something like this in my work to understand better the strategies I ‘m operating with

Dmitry Tworowski says

Some additional remarks. I took 15000 OHLC values for Crude Oil futures. I randomly selected N bars many times (genetic algorithms can help at this point) and did an optimization for each sample, followed by forward test for next n bars. Huge amount of computations, as you could imagine 🙂 The outcome was two numbers: optimization on ~1500 bars gives positive expectation for trading next 80 bars at low risk. The ratio 80/1500 would give some kind of measure for strategy’s memory. This remains me your signal-to-noise ratio.

Shaun Overton says

Hey Dmitry,

This is great. When you did the optimization on each sample, what did you optimize for?

–Shaun

Shaun Overton says

Exactly!

6 months of backtest is worthless. You raised a good point that this is not the same thing as limiting your backtests to recent data. The take-away in the article is that your signals should not use more than ~1,000 bars of data for making a current decision.

That does not apply to the entire

strategy. It only applies to decisions based on the nextsignal.