Overfitness - The Cliff of Doom

The Hard Road Newsletter

Yes, I just made up “The Cliff of Doom”

I may have made the name up, but anyone who has played around backtesting strategies and then actually let them run will be familiar with what I’m about to show you.

For reference, here’s the strategy. Please for the love of God, don’t try and run it.

Now, this is purely a demonstration of why backtests are imperfect and need to be understood in the context of history. Its very easy to work with the benefit of hindsight and craft a “strategy” that says it did 5000% in a couple months. The above chart shows quite clearly what happens once you set that strategy in to the wild. If it isn’t clear enough lets hone in on the two different time periods.

Here we have the first day it can be backtested (roughly) to the last day it was edited. Looks unrealistically great, doesn’t it? barely any drawdown, massive %5000 run up. I’ll be able to buy that Carribean island in a couple years, maybe even a small European nation!

And here we have the performance since it was last edited. One simple phrase.

Dog Water

The strategy posted above isn’t the most extreme example in terms of a chart differential (but its definitely up there!) It is, however, one of the most obviously problematic based off of simple things like backtest length, complexity of code, and the Incompehensibly high AR paired with nonexistant Drawdown. Seriously, a Calmar Ratio of 8000? Are we smoking crack?

The length of the backtest at creation, a whopping 7 months, should have been the first red flag to anyone that decided to try this out with anything more than the barest of minimum funding. The next red flag would have been the novella of code that comprises the logic. I commend anyone who is able to read it and decipher an actual thesis to this strategy. I cannot.

Now that I’ve roasted the shit out of this symphony, here’s the moral of the story.

Like the story of Odin learning to read the Runes, we have learned that knowledge requires sacrifice. In the case of algorithmic investing, that sacrifice is time and money. Good thing it’s not an eye and very nearly our lives!

Out of sample testing prior to serious investment is absolutely critical. In my estimation, its the second most important rule, right behind crafting a working thesis to build around.

In the context of Composer, the only reliable way to truly test something Out of Sample is live running. On other platforms with more robust (but inherently less user-friendly) capabilities, you can build the strategy on a series of random weeks or months, never giving the strategy a chance to “see” other weeks and months for the first time to help prove validity. You can approximate a process like this in Composer, but it is neither simple nor intuitive to do. I strongly recommend attempting it though!

Weekly Twitter Bad News

And now to end on a positive note!

That’s gonna do it for this week. Have a safe and happy Labor Day!

You want some curated strategies?

Check out The Hard Road Premium for access to strategies that have been tested live, with proven returns shown in the Discord!

Reply

or to participate.