Building an Optimal Portfolio with Python


8 min read

Building an Optimal Portfolio with Python

In the last article, we analyzed the performance of stocks in a portfolio to determine which is performing the best across areas such as Returns, Sharpe ratios (risk-to-reward), and other metrics. In this blog post, we'll be blending financial theory with real-world data & learn how to build an Optimal Portfolio. We will see the implementation in Python.

What Topics will be covered in this Blog

  • Modern Portfolio Theory

  • **Portfolio Optimization (Creating optimal portfolio by determining weights) **

  • Getting Discrete Allocation

Disclaimer: The material in this article is purely educational and should not be taken as professional investment advice. The premise of this article is not to show how to "GET RICH QUICKLY." The idea of this article is to get you started and to showcase the possibilities with Python.


Modern Portfolio Theory, or MPT (also known as mean-variance analysis), is a mathematical framework for constructing a portfolio of assets to **maximize expected return for a given level of market risk **(Standard Deviation of Portfolio Returns). Since risk is associated with variability in profit, we can quantify it using measures of dispersion such as variance and standard deviation.

** The trade-off between Risk & return forms the basis of the portfolio construction.**

It is imperative that the Higher the Risk, Higher will be the Return, So different investors will evaluate the trade-off differently based on individual risk aversion characteristics.

However, As we saw in the last article, it is possible to reduce risk while increasing the returns through efficient diversification,, i.e., by combining negatively correlated assets. It is possible to construct an efficient set of portfolios that have the least risk for a given return or highest return for a given level of risk; investors can choose a point on this efficient frontier (we will talk about what this is) depending on their risk-return preferences. This process of constructing an efficient set of portfolios is labeled as portfolio optimization, which is quite a complex task mathematically.

The expected return of the portfolio is calculated as a weighted sum of the individual assets' returns. The Portfolio risk depends on the proportion (weights) invested in each security, their individual risks, and their correlation or covariance. These two terms are used interchangeably, but there lies a difference between the two,

  • Covariance - The covariance can measure the extent to which two random variables vary together.

  • Correlation - The problem with Covariance is that it's not standardized & to do so, we divide the Covariance between two variables by their standard deviation, which gives us the coefficient of correlation ranging from -1 to 1.

An Efficient Frontier represents all possible portfolio combinations. It has the maximum return portfolio, consisting of a single asset with the highest return at the extreme right and the minimum variance portfolio on the extreme left. The returns represent the y-axis, while the level of risk lies on the x-axis.


Let's get started with Python!

Module Used:


PyPortfolioOpt was based on the idea that many investors understand the broad concepts related to portfolio optimization but are reluctant to solve complex mathematical optimization problems. It can optimize using the classical mean-variance optimization techniques, which we'll also be using.

So, in a nutshell, PyPortfolioOpt is a library that implements financial portfolio optimization methods.

If you wish to know more about it, you can refer to PyPortfolioOpt Documentation

Now that you are familiar with the Theory and have acquired basic knowledge of the PyPortfolioOpt module, we can move forward to the coding section.

Time to Code!

1. Installing the required libraries

Open the terminal and activate the conda environment to install the following packages.

pip install pyportfolioopt


2. Importing the libraries

There are multiple packages like pandas, numpy, and others which we will be using; if you do not have them installed, you can do them with pip

pip install <packagename>

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from datetime import date
from nsepy import get_history as gh'fivethirtyeight')

from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import  risk_models
from pypfopt import expected_returns
from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices

If you haven't read the previous article, you can head over to this link; we are continuing with the data extracted with the NSEpy library. Feeling lazy? Download a sample data set from this link.

3. Optimization

# calculating expected annual return and annualized sample covariance matrix of daily assets returns

mean = expected_returns.mean_historical_return(df)

S = risk_models.sample_cov(df) # for sample covariance matrix


Screenshot 2021-07-19 at 9.59.29 PM.png

Covariance is a measurement of the spread between numbers in a dataset, i.e., it measures how far each number in the data set is from the mean. The higher the variance of an asset price, the higher the risk asset bears and a higher return. The most commonly-used risk model is the covariance matrix, which describes asset volatilities and their co-dependence. This is important because one of the principles of diversification is that risk can be reduced by making many uncorrelated bets (Correlation is nothing but the standardized version of covariance)'ggplot')
fig = plt.figure()
sb.heatmap(S,xticklabels=S.columns, yticklabels=S.columns,
cmap='RdBu_r', annot=True, linewidth=0.5)
print('Covariance between daily simple returns of stocks in your portfolio')



ef = EfficientFrontier(mean,S)
weights = ef.max_sharpe() #for maximizing the Sharpe ratio #Optimization
cleaned_weights = ef.clean_weights() #to clean the raw weights
# Get the Keys and store them in a list
labels = list(cleaned_weights.keys())
# Get the Values and store them in a list
values = list(cleaned_weights.values())
fig, ax = plt.subplots()
ax.pie(values, labels=labels, autopct='%1.0f%%')
print('Portfolio Allocation')

Sharpe ratio describes that how much excess return you receive for the extra volatility you endure for holding a risky asset. After determining the Mean Historical Returns, we're assessing the efficient frontier by setting the Optimization condition as "Max Sharpe" to have weights in such a way that will maximize the Sharpe Ratio. Then, we're cleaning the raw weights, rounding them off & setting any weights whose absolute values are below the cutoff to 0%.



Just so you know, the overlapping stocks in the above image are DABUR and ICICIBANK, they both have got 0% allocation, so they are overlapping in the pie chart. Not the best way to represent, right? Tell us in the comments how we can fix this, and you will get a mention on the blog :)

Portfolio Performance:


Screenshot 2021-07-19 at 10.16.01 PM.png

Getting Discrete Allocation:

portfolio_amount = float(input("Enter the amount you want to invest: "))
if portfolio_amount != '' :
    # Get discrete allocation of each share per stock

    latest_prices = get_latest_prices(df)
    weights = cleaned_weights
    discrete_allocation = DiscreteAllocation(weights, latest_prices , total_portfolio_value = int(portfolio_amount))
    allocation , leftover = discrete_allocation.lp_portfolio()

    discrete_allocation_list = []

    for symbol in allocation:

    portfolio_df = pd.DataFrame(columns =['Ticker' , 'Number of stocks to buy'])

    portfolio_df['Ticker'] = allocation
    portfolio_df['Number of stocks to buy'] = discrete_allocation_list
    print('Number of stocks to buy with the amount of โ‚จ ' + str(portfolio_amount))
    print('Funds remaining with you will be: โ‚จ' , int(leftover))

We're essentially taking the input from the user for the Amount that they want to invest & based on this input & the latest prices, the number of shares to be bought is being determined.


Screenshot 2021-07-19 at 10.21.12 PM.png

While running the above piece of code, you may run into an error message like below. GLPK_MI is a package used under the hood by pyPortfolioOpt to solve the optimal amount to invest in each share.


To solve this error, enter the below command in your anaconda terminal and restart the kernel or re-run the script and it should work.

conda install -c conda-forge cvxopt

4. Wrapping it up

And with that, it's a wrap!

Portfolio Construction is a critically important aspect when it comes to managing investments. Identifying the potential assets is the first step in creating an optimal portfolio, after which the important factors like expected risk & return come into the picture. With PyPortfolioOpt, one can create their own optimization problems with custom objectives, data, and constraints, which is quite powerful; apart from that, various models can be utilized.

As we've seen, the theory is itself very vast, and it might seem to be a lot to digest in a go, Nevertheless, I hope you enjoyed this article!

You can also access the GitHub link here to view the entire code in one single file directly.

Thank you for reading; if you have reached it so far, please like the article; it will encourage me to write more articles. Do share your valuable suggestions; I would really appreciate your honest feedback!๐Ÿ™‚

Please feel free to leave a comment and connect if you have any questions regarding this or require any further information. Consider subscribing to my mailing list for automatic updates on future articles. ๐Ÿ“ฌ

Please note we haven't made any new posts since Nov 2021 on this blog, you are free to subscribe to the mailing list, however, you will be auto-added to the new blog's ( mailing list as well.

I would love to connect with you over Mail, or you can also find me on Linkedin

If you liked this article, consider buying me a book ๐Ÿ“– by clicking here or the button below.

Did you find this article valuable?

Support Trade With Python by becoming a sponsor. Any amount is appreciated!