Machine learning (ML) has emerged as a useful device in our everyday life, enabling the recognition of images and speech, performing medical diagnosis, and improving product recommendations. Is machine learning also useful for the design of investment strategies and can we use it to predict future returns of financial securities?

What is ML? The definition of this term is frequently dependent on the field. For research purposes in empirical asset pricing, I define ML as a “collection of high-dimensional models for statistical prediction where (i) the risk of in-sample model overfitting is mitigated and (ii) efficient algorithms search among potential model specifications.”

In their *Review of Financial Studies (2020)* research paper, Gu, Kelly, and Xiu apply ML methods to predict future returns using a sample of 30,000 individual US stocks over a period of 60 years from 1957 to 2016. To predict future returns for each stock they use 94 characteristics, eight aggregate time-series variables, and 74 industry-sector dummy variables (totaling more than 900 prediction signals).

The following ML models are applied:

- Baseline: Simple unconstrained linear regression.
- Penalized linear regression: Punishes the inclusion of new predictor variables and reduces potential overfitting of the model (models:
*lasso*and*ridge regression*). - Dimension reduction techniques: Average the impact of all potential predictor variables to an aggregate predictor (
*principal component regression*and*partial least squares*). *Penalized generalized linear models*that allow for nonlinearities in the predictor variables.*Boosted regression trees*and*random forests*: Nonparametric models that allow for interactions between the predictor variables.*Neural networks (deep learning)*: Nonparametric models that use*activation functions*and different*layers*to account for nonlinearities and interactions between predictor variables.

The authors train these different models over the first 18 years in the sample, validate them in the consecutive 12 years, and use the remaining 30 years of data for an out-of-sample analysis based on monthly forecasts.

The results of the study are in favor of ML being helpful in predicting future stock returns. In particular, *neural networks* are able to substantially increase the accuracy of future return forecasts. Speaking in numbers, a value-weighted long-short portfolio that takes positions based on stock-level neural network forecasts earns an annualized out-of-sample Sharpe ratio of 1.35. This is twice the performance of a leading regression-based strategy from the literature.

What are the important predictor variables that boost accuracy of return forecasts? There is strong evidence that price trends (past momentum and reversal), liquidity (stock size and volume), as well as stock volatility are driving the results. Moreover, interactions between these variables and the aggregate book-to-market ratio seems to be important for the success of the documented neural network trading strategy. Note that the investigation refrains from the impact of trading costs which are likely to be substantial (as the portfolio formation of the investment strategy requires a high turnover).

Are you interested to dig deeper into the topic? Have a look at the working paper on SSRN or the published paper in the Review of Financial Studies.

### Auteur(s) de cette contribution :

Professeur en gestion du risque financier à l'Universtié de Neuchâtel