Why Adjust your Time-Series Stock Data for Symbol/Ticker Changes and How to Do that using Python?
Welcome Readers ๐คฉ to the third article in the series "NSE EOD Stock Prices" This article will be using the stock prices file we created in the last article, and I will show you a way to adjust this dataset for all stock/ticker symbol changes from the year 2000 - 2021
But first, let's discuss what a ticker is?
A ticker symbol is the grouping of a specific set of characters, usually letters, that represent and identify any type of public security that trades on an exchange. Symbols are unique, allowing investors to research and trade shares in the companies they represent.
Every security listed has a ticker symbol, which the company chooses before it lists on an exchange. Although the symbol may be an abbreviation or other equivalent of the company's name, this isn't a requirement.
Source : Investopedia.
Okay, that sounds great, but why are we concerned with making this change?
When you are dealing with time-series data, continuity of data is important. You might want to capture the average price of the whole year for a particular ticker, or your strategy might require you to calculate the upper and lower Bollinger Bands, and suddenly there is a break in your data because of ticker change and your numbers get messed up.
Let's take TATA CONSUMER PRODUCTS LIMITED as an example that changed its symbol from TATAGLOBAL
to TATACONSUM
on 27th Feb 2020. Have a look into the data you have downloaded for the year 2020, and you will notice that TATAGLOBAL
disappears from the Bhavcopies post 27th Feb 2020 and is replaced by TATACONSUM
Now imagine, if you want to find out the 52 Week High/Low price of TATA CONSUMER PRODUCTS LIMITED for 2020, you will look up the latest ticker in your dataset, but clearly, that is an incorrect approach because we are missing all the data before the actual symbol change on 27th Feb 2020.
I hope you now understand the importance of this step to keep our data clean and up to date.
Python Implementation
So, NSE publishes this data for symbol changes from time to time on their website in a .csv file; you can access the latest file on this link. You will find data on every symbol change on NSE since the year 2000 until the date you access the link and should look like the below.
In the above data, SM_KEY_SYMBOL
is the old ticker, SM_NEW_SYMBOL
is the new ticker and SM_APPLICABLE_FROM
is the effective date of the ticker change.
Now, I will show you how to update your dataset for all the symbol changes until now; this will work even if you have data for multiple years; we will only work with 2020's data. Ideally, we will have to get this file from the NSE website daily and process our dataset for any symbol changes; we will discuss more on how to do that in the 6th Article in this series on Object-Oriented Programming.
We will be using the existing pandas
, os
, requests
, datetime
Python libraries which come with the Anaconda
environment for this step.
Step 1: Using requests
library to download file from NSE website.
#Making all necessary imports
import pandas as pd
import os, requests
from datetime import datetime, date
#URL from where file can be downloaded.
nse_symbol_path = 'https://www1.nseindia.com/content/equities/symbolchange.csv'
r = requests.get(nse_symbol_path) #getting the file.
r.status_code #to check if the request was successful.
In the above, we made all necessary imports for our code and defined the URL from where we will be downloading the file daily. Next, we use requests
to go to the URL; before moving on to save the file on your system, it's a good step to check if your request to the website was successful, and you can check that via the status_code
attribute. Generally, if your request went through, you will get a 200
status code. For errors, you may get 404
, 401
, 500
, you can check all possible status codes here.
Yay! We got a 200
status code. Let's go ahead and save this file on our system. To save, you will have to give the file a save location and a name. I prefer having the name something like symbol_change_dd.mm.yyyy.csv
just for reference when the file was downloaded.
filename = f'symbol_change_{datetime.now().strftime("%d.%m.%Y")}.csv'
#The datetime function will give you the current date.
open(f'path/to/folder/{filename}', 'wb').write(r.content)
#This will save the file in the folder you defined.
In my case, since I am running this on 28.02.2021, the file is saved as symbol_change_28.02.2021.csv
Step 2: Generating a dictionary of stock symbol changes.
So, let's use the file downloaded to generate a new dictionary that will have the old value as a key and the new value as the answer to that key. If you are not aware of the data type dict
, RealPython explains it really well here.
symbol_df = pd.read_csv('path/to/folder/symbol_change_dd.mm.yyyy.csv')
#deleting all whitespace from column names and dataset.
symbol_df.columns = symbol_df.columns.str.replace(' ', '')
symbol_df = symbol_df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)
#converting the date column to datetime dtype.
symbol_df['SM_APPLICABLE_FROM'] = pd.to_datetime(symbol_df['SM_APPLICABLE_FROM'])
#very important, sorting values by date
symbol_df.sort_values(by ='SM_APPLICABLE_FROM', axis=0, inplace=True)
A couple of things to note before we move onto the next lines of code:
- Sorting your data is very important. If you do not sort, you might have old tickers in your main data even after adjusting for it; why is that? Because some tickers have changed multiple times, take the below for an example; if it's not sorted, the end ticker in my dataset will be
ISENSEX
even when it should beICICSENSX
which is the latest one.
- Sometimes, the symbol change file also contains future symbol changes, and we do not want to adjust for those symbols yet, so it's important to filter your data till the date you are running the code.
#Filtering for point 2 above.
new_sym_df = symbol_df[(symbol_df['SM_APPLICABLE_FROM'] <= pd.to_datetime('today').floor('D'))]
#Creating a dictionary of symbol changes.
symbol_change_dict = dict(zip(new_sym_df.SM_KEY_SYMBOL, new_sym_df.SM_NEW_SYMBOL))
print(symbol_change_dict)
As you can see, we now have a dictionary for all symbol changes, where the blue arrow is the old symbol and the red arrow is the new symbol.
Step 3: Adjusting our Stock Prices dataset to replace old symbols.
Let's go over all the old/new symbol pairs in our dictionary in a loop and replace all old symbols with new ones with our main stock prices data; it's straightforward to do with df.replace()
, you can check the documentation for this function here.
stock_price_df = pd.read_csv('path/to/your/existing/data.csv',
index_col = 'DATE1')
for keys in symbol_change_dict.keys():
print(f'Processing {keys} symbol')
stock_price_df.replace(to_replace = keys,
value = symbol_change_dict[keys], inplace = True)
#saving this new dataset with updated symbols
stock_price_df.to_csv('path/to/folder/new_data.csv')
And that's it ๐; you now have your updated dataset factored in for all symbol changes. Let's quickly verify our new dataset for TATACONSUM
, it should have data from 01-01-2020
instead of just from 27-02-2020
Yep, it's there, and TATAGLOBAL
has been removed; now, your strategies should have complete data to work.
You might have noticed that we processed all the ticker changes in this article; you are not expected to do that daily. Ideally, we should set an interval for this process, and if that interval is daily, it only makes sense to grab the file and check if there are any symbol changes since the last day and, if not, just close the program.
We will discuss all of this again in the final article, where I will tell you how this should be implemented in real life, but for now, at least you know how to hack your way to make these changes. Please drop in any suggestions/questions you have to me on Twitter or Linkedin; I would be more than happy to assist you and get your feedback to make my future articles better. Also, do consider subscribing to my newsletter ๐ฌ to get regular updates.
You can also access Github link here to view the whole code in one single file directly.
If you like it up till now, consider buying me a coffee โ by clicking here or the button below.