The 5 minute guide to quant

John
Dec 11, 2023
6 min read

Sections:

More is better

The 5 minute rule is a psychological technique to help avoid procrastination. By committing to spend 5 minutes on a task, it is easier to get started, and so long as it isn't horrible, you may find yourself continuing long past the committed time.

Perhaps as a result, procrastination enablers such as YouTube often have content that is 5 minutes long (more or less).

'Oh I'll just watch the first of these 5 minute guides...'

For those of you that stayed on the 5 minute task of reading this, the main point of Dreadnought was quantitative. By the early 1900s, ships had better signals of where to point their guns as a result of advances in range finding technology. However, there was still some randomness in where shots would fall, so the more rounds you could fire at the same time, the greater certainty you would have that the target would be hit.

Quantitative investing works on a similar principle.

You start with a signal that you think identifies suitable target stocks, and then buy a large number of them to eliminate the effects of randomness in individual stock returns.

The signal might be something simple like 'Value' or 'Quality', or it might be something more complicated such as a neural net language model parsing corporate filings.

Let's look at an example to understand the process.

The signal we'll use is the "BTFD" signal. The stocks to buy today are the ones that went down most yesterday.

To implement this, we gathered daily returns for current S&P 500 stocks going back to 2010. This should be sufficient stocks to see if there is actually a signal.

The first step is to select a subset of data to use for backtesting, with the remainder used for validation. We'll use up to the end of 2019 for our backtest.

Now we rank the daily returns of the 500 stocks, and divide the stocks into quintiles based on these ranks. Next we look at the average return the next day in each quintile.

On average, stocks in the bottom quintile went up almost 9bps the next day, while stocks in the top quintile were up only 6bps.

Moreover, the linear progression of returns by quintile is highly supportive of this being a real signal.

Is a 3bps difference large?

Well 3bps per day is 7.5% per year, so it isn't small, but we'd like to do better.

The way this is done is to find more signals and then combine them together.

For example, instead of just using yesterday's return, let's use the 3, 5, 7 and 9 day return as well.

The simple way to combine these 5 different versions of "BTFD" is to average the positions of each. However, it is generally preferable to use an ensemble model.

If we just take the average return of each model, we end up with a smoother set of returns, but we're still making 7.5% per year.

We can create an ensemble by first ranking each stock by each model, summing these ranks, and then reranking the sum to create the final signal.

In the example, this takes us up to about 4bps per day, or 10% annually.

Okay so let's say that we're now happy with the model. Now we expose it to our untouched data to validate that it still works.

For the most part we find that it does, though there was a major drawdown in 2020. This is the main reason why 2020 was a terrible year for quant funds.

There are two probable reasons why the strategy did remarkably well in the second half of 2020.

First is that we're using current S&P 500 stocks as our universe. This introduces what is called survivorship bias into our returns. The stocks that crashed and burned in 2020 are no longer in our dataset to make us look bad. To overcome this we need to know index membership through time.

The second reason is that all the quant funds running this strategy got carried out and had to stop, so the returns didn't get competed away so much.

This latter point is important. If you were planning to implement this strategy, you should know that we're looking at an idealized return where we trade at the close using information that is available to us at the close. Quant funds approximate this by using high speed intraday price feeds. If we try to trade at the next open instead, we'll find that prices have adjusted and all the return has gone.

So how can we find a signal that we can monetize?

There are 3 main ways:

Get a dataset that few other people have (e.g. surveys of internet users in Iran and China)
Use a set of returns that remove sources of noise and known signals so that new small signals can be seen more easily.
Build a model that is relatively complicated and so unlikely to be widely used

The first is called alternative data, and good alternative data is generally expensive.

The simplest version of the second is to use beta cleaned returns to measure the performance of your signal. However, factors such as value and momentum are typically removed as well. A quant fund might also feel obligated to purchase alternative datasets just so they can remove the resulting factor from their returns.

The third approach is fraught with dangers, but is the most practical approach to take if you are not running a quant fund.

The main two dangers to be paranoid about are information leakage and overfitting. If you're not sufficiently paranoid, then you'll probably fall into a trap.

Better get some more psychological help then.

Model Portfolios

1. On the screen: EMXC, ECH

While large cap stock indices such as SPY and QQQ are closing 2023 near to their high for the year, the emerging market ETF EEM remains significantly below its high. In large part this is due to China of course.

If we exclude China by looking at the ETF EMXC, emerging markets look relatively healthy.

The underlying theme is that weak demand for commodities from China has benefitted commodity consumers at the cost of producers. The movement of manufacturing from China to other countries may explain why EMXC has become more biased towards commodity consumers in 2023.

In the short term, we continue to wait for a deeper correction in ECH in anticipation of stronger demand for Copper from China. However, if China continues to fail to stimulate growth, then EMXC may be a better medium term alternative.

2. Portfolio positions

As discussed on Beat The Market, we're sticking with our current positions for now.

We continue with our plan to exit our position in SPXU in the event of a correction in equities.

At present, the hedged BAC position is our lowest risk. In the event that we exit the hedge, the remaining position in BAC will no longer be the lowest risk, but it won't be the highest either, so we will not reduce our holding in BAC at that time for risk reasons.

Our short term profit target in JETS is 20.50.

The current portfolio weights are as follows:

In our equity portfolio we stay short 8.2% XLI against 7% XLB. Materials corrected lower last week while industrials remained at they high. We expect industrials to correct this week, and are looking for materials to resume their uptrend.

In the Global Asset Allocation Portfolio, we are underweight US and international equities and underweight US bonds.

Talking Stocks

Talking stocks is now taking place at our daily discussion category on the forum. https://www.davidwoounbound.com/forum/daily-discussion

We have irreverent but insightful commentary on stocks in the news, updates on trades we have on, and new trades throughout the week.

Make sure to follow the category if you want email notifications when there are new posts.

Epilogue

The reason for using 1,3,5,7 and 9 day returns in our ensemble is essentially to create constructive interference in a wave pattern.

Information leakage generally can be seen in results that are too good to be true. I like to include a version of my model where I deliberately include future information to validate that it does significantly better. In some instances this gives an upper bound for how good my model could possibly be. The most reliable method though is to use rolling windows for cross validation so that models are never fit using data that came after the sample used for testing. (At least they aren't unless you screw something up.)

Overfitting generally results not from building a model that is too high variance, but from trying too many different models on the same dataset. If you try 1000 random models, then 1 of them is bound to look good randomly. Don't do that. In the context of academia this is related to p-hacking.