See Part I for an overview.
Prerequisites for creating machine learning algorithms for trading using Python
ExtensivePython librariesand frameworks make it a popular choice for machine learning tasks, enabling developers to implement and experiment with various algorithms, process and analyse data efficiently, and build predictive models.
In order to create the machine learning algorithms for trading using Python, you will need the following prerequisites:
- Installation of Python packages and libraries meant for machine learning
- Full-fledged knowledge of steps of machine learning
- Knowing the application models
Install a few packages and libraries
Python machine learning specifically focuses on using Python for the development and application of machine learning models.
You may add one line to install the packages “pip installnumpy” You can install the necessarypackagesin the Anaconda Prompt using the codes as mentioned below.
- Scikit-learn for machine learning
- TensorFlow for deep learning
- Keras for deep learning
- PyTorch for neural networks
- NLTK for natural language processing
Full-fledged knowledge of steps of machine learning
In addition to general Python knowledge, proficiency in Python machine learning necessitates a deeper understanding of machine learning concepts, algorithms, model evaluation, feature engineering, and data preprocessing.
Knowing the application models
The primary focus of Python machine learning is the development and application of models and algorithms for tasks like classification, regression, clustering, recommendation systems, natural language processing, image recognition, and other machine learning applications.
How to use algorithmic trading with machine learning in Python?
Let us see the steps to doing algorithmic trading with machine learning in Python. These steps are:
- Problem statement
- Getting the data and making it usable for machine learning algorithm
- Creating hyperparameter
- Splitting the data into test and train sets
- Getting the best-fit parameters to create a new function
- Making the predictions and checking the performance
Problem Statement
Let’s start by understanding what we are aiming to do. By the end of this machine learning for algorithmic trading with Python tutorial, I will show you how to create an algorithm that can predict the closing price of a day from the previous OHLC (Open, High, Low, Close) data.
I also want to monitor the prediction error along with the size of the input data.
Let us import all the libraries and packages needed to build this machine-learning algorithm.
import numpy as npfrom sklearn.linear_model import Lassofrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import RandomizedSearchCV as rcvfrom sklearn.pipeline import Pipelinefrom sklearn.impute import SimpleImputerimport matplotlib.pyplot as pltfrom IPython import get_ipython
Import_libraries_ML.pyhosted with ❤ byGitHub
Getting the data and making it usable for machine learning algorithm
To create any algorithm, we need data to train the algorithm and then to make predictions on new unseen data. In this machine learning for algorithmic trading with Python tutorial, we will fetch the data from Yahoo.
To accomplish this, we will use the data reader function from the pandas library. This function is extensively used, enabling you to get data from many online sources.
avg_err={}avg_train_err={}# To fetch financial dataimport yfinance as yf# Fetch dataAAPL_data= yf.download('AAPL', start='2005-1-1', end='2023-1-1', auto_adjust = True)df = df[['Open', 'High', 'Low', 'Close']]
Fetch_data_AAPL.pyhosted with ❤ byGitHub
We are fetching the data of AAPL(ticker) or APPLE. This stock can be used as a proxy for the performance of the S&P 500 index. We specify the year starting from which we will be pulling the data.
Once the data is in, we will discard any data other than the OHLC, such as volume and adjusted Close, to create our data frame ‘df ’.
Now we need to make our predictions from past data, and these past features will aid themachine learning model trade. So, let’s create new columns in the data frame that contain data with one day lag.
df = AAPL_data[['Open', 'High', 'Low', 'Close']].copy()df['open']=AAPL_data['Open'].shift(1)df['high']=AAPL_data['High'].shift(1)df['low']=AAPL_data['Low'].shift(1)df['close']=AAPL_data['Close'].shift(1)df=df.dropna()
Data_one_day_lag.pyhosted with ❤ byGitHub
Note: The capital letters are dropped for lower-case letters in the names of new columns.
Creating Hyperparameters
Although the concept of hyperparameters is worthy of a blog in itself, for now I will just say a few words about them. These are the parameters that the machine learning algorithm can’t learn over but needs to be iterated over. We use them to see which predefined functions or parameters yield the best-fit function.
imp = SimpleImputer(missing_values=np.nan, strategy='mean')steps = [('imputation', imp),('scaler',StandardScaler()),('lasso',Lasso())]pipeline =Pipeline(steps)parameters = {'lasso__alpha':np.arange(0.0001,10,.0001),'lasso__max_iter':np.random.uniform(100,100000,4)}reg = rcv(pipeline, parameters,cv=5)
Creating_hyperparameters.pyhosted with ❤ byGitHub
In this example, I have used Lasso regression which uses the L1 type of regularisation. This is a type of machine learning model based on regression analysis which is used to predict continuous data.
This type of regularisation is very useful when you are using feature selection. It is capable of reducing the coefficient values to zero. The SimpleImputer function replaces any NaN values that can affect our predictions with mean values, as specified in the code.
The ‘steps’ are a bunch of functions that are incorporated as a part of the Pipeline function. The pipeline is a very efficient tool to carry out multiple operations on the data set. Here we have also passed the Lasso function parameters along with a list of values that can be iterated over.
Although I am not going into details of what exactly these parameters do, they are something worthy of digging deeper into. Finally, I called the randomised search function for performing the cross-validation.
In this example, we used 5-fold cross-validation. In k-fold cross-validation, the original sample is randomly partitioned into k equal-sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k-1 subsamples are used as training data.
The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. Cross-validation combines (averages) measures of fit (prediction error) to derive a more accurate estimate of model prediction performance.
Based on the fit parameter, we decide on the best features.
In the next section of the machine learning for algorithmic trading with Python tutorial, we will look at test and train sets.
Stay tuned for Part III to learn how to split the data into test and train sets.
Originally posted onQuantInstiBlog.
Join the Discussion
Thank you for engaging with IBKR Campus. If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.
Related Tags:
Algo Trading Keras Machine Learning NLTK NumPy Python PyTorch Scikit-learn TensorFlow
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circ*mstances and, as necessary, seek professional advice.