Creating Your Own Recent Stock News Sentiment Analyzer using Python

Scraping Latest GoogleNews Articles to Guage Sentiment

Β·

8 min read

Creating Your Own Recent Stock News Sentiment Analyzer using Python

Have you ever thought of parsing a big news article to get its summary in seconds and running sentiment analysis on the summary of the same article? If that is the case, then you're reading the right article.πŸ“

Also, It becomes much more fascinating when it comes to recent stock news.πŸ“ˆ And we're just going to do that! πŸ˜ƒ

In this tutorial, you'll learn the impressive capabilities of the following Python packages:

  • Newspaper: It is a Python module used for extracting & curating articles. It extracts the important text from an articleπŸ—ž posted on a website with the help of the URL being passed.

  • Google News: It is used to scrape news from the Google newsπŸ“° platform.

A quick example to show you an auto-generated summary of an article using Newspaper Library

India's top state oil refiners are reducing processing runs, and crude imports as the surging COVID-19 pandemic has cut fuel consumption, leading to higher product stockpiles at the plants, company officials told Reuters on Tuesday. "We do not anticipate that our crude processing would be reduced to last year's level of 65%-70% as inter-state vehicle movement is still there ... (the) economy is functioning," he said. State-run Bharat Petroleum Corp has cut its crude imports by 1 million barrels in May and will reduce purchases by 2 million barrels in June, a company official said. HPCL has no immediate plan to cut crude runs, he said, although the company has shut some units at its 150,000 bpd Mumbai refinery for maintenance and upgrade. To ease storage problems, India could export some diesel, which constitute 40% of local refiners output, another BPCL official said.

The Flow

Extract -> Summarize -> Analyze

First, We'll extract the news articles with the Google news Python package, then we'll summarize them with the Newspaper Python Package, and towards the end, we'll run sentiment analysis on the extracted & summarized news articles with the VADER.

Before starting, I am assuming that you know the nitty-gritty of Sentiment Analysis; if not, then please check out my previous article on the same.

The Python Implementation

Before we get into the main code, here is some mandatory legal text.

Disclaimer: The material in this article is purely educational and should not be taken as professional investment advice. The idea of this simple article is just to get you started and to showcase the possibilities with Python.

1. Installing the required libraries

Open the terminal and activate the conda environment to install the packages.

pip install newspaper3k
pip install GoogleNews
pip install nltk

Screenshot 2021-05-11 at 10.19.06 PM.png

2. Importing the libraries

import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from GoogleNews import GoogleNews
from newspaper import Article
from newspaper import Config
from wordcloud import WordCloud, STOPWORDS

nltk.download('vader_lexicon') #required for Sentiment Analysis

3. Extracting News

now = dt.date.today()
now = now.strftime('%m-%d-%Y')
yesterday = dt.date.today() - dt.timedelta(days = 1)
yesterday = yesterday.strftime('%m-%d-%Y')

nltk.download('punkt')
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = user_agent
config.request_timeout = 10

Herein, We've assigned the variables to provide the timeframe of news extraction, which is essentially 1 day.
The NLTK punkt is a tokenizer (converter of data into a string of characters) that divides a text into a list of sentences by using an advanced algorithm.

We need config because sometimes newspaper package might not be able to download an article due to the restriction in accessing the article with a specified URL. To bypass that restriction, we set the user_agent variable in order to parse those restricted articles and get authorized.

At last, the connection may occasionally timeout, as it uses the Python module requests so to prevent the same, we've used config.request_timeout

Now, it's time to write the code to extract the News for a particular stock or a ticker.

# save the company name in a variable
company_name = input("Please provide the name of the Company or a Ticker: ")
#As long as the company name is valid, not empty...
if company_name != '':
    print(f'Searching for and analyzing {company_name}, Please be patient, it might take a while...')

    #Extract News with Google News
    googlenews = GoogleNews(start=yesterday,end=now)
    googlenews.search(company_name)
    result = googlenews.result()
    #store the results
    df = pd.DataFrame(result)
    print(df)

Output:

Screenshot 2021-05-12 at 1.55.09 PM.png

It's important to note here that it fetches only 10 articles at a time that are there on the first page of the Google search; though we can fetch more articles but it'll defeat the purpose of extracting the 'Recent' News.

The above implementation uses BPCL (Bharat Petroleum Corporation Ltd.) to extract the News over the past 1 day as an example; if you would like to see the same kind of implementation, you can open this Google Colab Notebook.

4. Summarizing

Now that we've extracted the News, we'll parse the links of extracted articles that are stored in the data frame variable and perform the NLP (Natural language processing) operations on those articles.

try:
    list =[] #creating an empty list 
    for i in df.index:
        dict = {} #creating an empty dictionary to append an article in every single iteration
        article = Article(df['link'][i],config=config) #providing the link
        try:
          article.download() #downloading the article 
          article.parse() #parsing the article
          article.nlp() #performing natural language processing (nlp)
        except:
           pass 
        #storing results in our empty dictionary
        dict['Date']=df['date'][i] 
        dict['Media']=df['media'][i]
        dict['Title']=article.title
        dict['Article']=article.text
        dict['Summary']=article.summary
        dict['Key_words']=article.keywords
        list.append(dict)
    check_empty = not any(list)
    # print(check_empty)
    if check_empty == False:
      news_df=pd.DataFrame(list) #creating dataframe
      print(news_df)

except Exception as e:
    #exception handling
    print("exception occurred:" + str(e))
    print('Looks like, there is some error in retrieving the data, Please try again or try with a different ticker.' )

We've implemented Nested exception handling here because sometimes the Newspaper module throws an error related to downloading and parsing the articles, so exception handling ensures the flow of our program, and if there will be an error other than this, then our program will throw an error "upwards" instead of the error-code.

We are making sure that if the Newspaper module fails to download an article, then we'll skip the same and move on to the next article.

We're also ensuring that our list doesn't remain empty; it is because if the Newspaper module isn't able to Parse the article, then we'll not get the desired output.

Output:

Screenshot 2021-05-12 at 3.26.50 PM.png

Screenshot 2021-05-12 at 3.27.04 PM.png

5. Sentiment Analysis

This is similar to what we did in this article. If you haven't read it yet, then have a look at it.

#Sentiment Analysis
def percentage(part,whole):
    return 100 * float(part)/float(whole)

#Assigning Initial Values
positive = 0
negative = 0
neutral = 0
#Creating empty lists
news_list = []
neutral_list = []
negative_list = []
positive_list = []

#Iterating over the tweets in the dataframe
for news in news_df['Summary']:
    news_list.append(news)
    analyzer = SentimentIntensityAnalyzer().polarity_scores(news)
    neg = analyzer['neg']
    neu = analyzer['neu']
    pos = analyzer['pos']
    comp = analyzer['compound']

    if neg > pos:
        negative_list.append(news) #appending the news that satisfies this condition
        negative += 1 #increasing the count by 1
    elif pos > neg:
        positive_list.append(news) #appending the news that satisfies this condition
        positive += 1 #increasing the count by 1
    elif pos == neg:
        neutral_list.append(news) #appending the news that satisfies this condition
        neutral += 1 #increasing the count by 1 

positive = percentage(positive, len(news_df)) #percentage is the function defined above
negative = percentage(negative, len(news_df))
neutral = percentage(neutral, len(news_df))

#Converting lists to pandas dataframe
news_list = pd.DataFrame(news_list)
neutral_list = pd.DataFrame(neutral_list)
negative_list = pd.DataFrame(negative_list)
positive_list = pd.DataFrame(positive_list)
#using len(length) function for counting
print("Positive Sentiment:", '%.2f' % len(positive_list), end='\n')
print("Neutral Sentiment:", '%.2f' % len(neutral_list), end='\n')
print("Negative Sentiment:", '%.2f' % len(negative_list), end='\n')

#Creating PieCart
labels = ['Positive ['+str(round(positive))+'%]' , 'Neutral ['+str(round(neutral))+'%]','Negative ['+str(round(negative))+'%]']
sizes = [positive, neutral, negative]
colors = ['yellowgreen', 'blue','red']
patches, texts = plt.pie(sizes,colors=colors, startangle=90)
plt.style.use('default')
plt.legend(labels)
plt.title("Sentiment Analysis Result for stock= "+company_name+"" )
plt.axis('equal')
plt.show()

# Word cloud visualization
def word_cloud(text):
    stopwords = set(STOPWORDS)
    allWords = ' '.join([nws for nws in text])
    wordCloud = WordCloud(background_color='black',width = 1600, height = 800,stopwords = stopwords,min_font_size = 20,max_font_size=150,colormap='prism').generate(allWords)
    fig, ax = plt.subplots(figsize=(20,10), facecolor='k')
    plt.imshow(wordCloud)
    ax.axis("off")
    fig.tight_layout(pad=0)
    plt.show()

print('Wordcloud for ' + company_name)
word_cloud(news_df['Summary'].values)

Output:

1.jpg

2.jpg

6. Wrapping it up

And with that, it's a wrap! I hope you found the article useful!

You can also access the GitHub link here to view the entire code in one single file directly.

Thank you for reading; if you have reached so far, please like the article; it will encourage me to write more such articles. Do share your valuable suggestions; I would really appreciate your honest feedback!πŸ™‚

Please feel free to leave a comment and connect if you have any questions regarding this or require any further information. Consider subscribing to my mailing list for automatic updates on future articles. πŸ“¬

πŸ’‘
Please note we haven't made any new posts since Nov 2021 on this blog, you are free to subscribe to the mailing list, however, you will be auto-added to the new blog's (thealtinvestor.in) mailing list as well.

I would love to connect with you over Mail, or you can also find me on Linkedin

If you liked this article, consider buying me a book πŸ“– by clicking here or the button below.

Did you find this article valuable?

Support Trade With Python by becoming a sponsor. Any amount is appreciated!

Β