Statistical Vs. Predictive Models with Examples
We use the term \statistical model" to denote standard derived analytical techniques that have discrete mathematical analogs, such as standard deviation, covariance, Sharpe ratio, volatility indices etc. Statistical model techniques do not involve the model training and inference pipeline, but are rather an extracted set of data analytics.
In finance, these models are widely used for buy/sell recommendations and identifying different zones of prices based on data features, such as volatility.
For example, we can use Bollinger Bands(21), a prominent technique in traditional financial analysis to identify price trends. The Bollinger Bands track moving averages based on days (usually 20), and create an upper and lower band with a difference of +/- 2*(Standard Deviation (SD) over those days). This is further refined by using 4 bands instead of 2, for +2 SD, +1 SD, -1 SD and -2 SD respectively. We use Uni token vs USD data in this example extracted from CoinGecko(22).
Bollinger Band Example
The double band model helps identify zones for buying, selling and neutral areas. This simple metric limits risk exposure and adapts their position according to market trends.
Depending on the Bollinger Bands’ performance, other techniques, such as Sortino(23) Ratio, Keltner(24) channels, Double Moving Average, etc. can be automatically subbed in, as they outperform this metric.
Such statistical models do not require heavy computations, compared to deep learning models which we discuss in the following section of ”Predictive Models”.
The term ”predictive models” denotes models that are developed using machine learning techniques that involve a training and inference pipeline. The models identify the required patterns based on the problem definition for prediction results, with future prices, risk categorization, and buying/selling strategy development being some common use cases. This requires a high level of domain expertise in financial data and machine learning. The other challenging parts of such models is the complexity of their development and feasibility of deployment. This tends to be due to the computational requirements of the models and their heavy resource usage.
Figure 9: Bollinger Band Visualization for sample data
For example, we look into a Long Short-term Memory (LSTM) network for price prediction with the training and inference pipelines. We use the same Uni token (UNI) vs USD data for analysis using features of ‘market_cap’, ‘total_volume’ and ‘price’.
Figure 10: Feature visualization for mode
We break this data into training (70%) and testing (30%) for training the LSTM network using keras with the following architecture summary.
Figure 11: Model Architecture Summary
We also present the loss ratio curve of LSTM over 50 epochs for training and test sets with ‘price’ being the target variable.
Figure 12: Loss curve for training and testing phase
Designing of such networks and pre-processing of features suitable for such models require domain expertise in both the machine learning and finance fields. Moreover, the intense training of deep learning models usually require a sophisticated hardware stack and may take days to train accurately. This can extend deeply into the field of deep reinforcement learning and agent-based modelling, which are out of scope for this whitepaper.