In this article, we will see how to Scrape the Cricket stats data from ESPN-Cricinfo and how we can visualize that data using streamlit with Heroku deployment.
Before all, let's see how our final web-app will look like.
Here you can see this web-app allows people to intact with the visualizations. With changing the target fields such as teams, toss, batting order, ground, result, and year the graphs are changing too.
Steps to reach our final destination:
- Data scraping from ESPNCricinfo 🏏
- Data cleaning using SQLite and Python 🧹
- Producing visualizations using streamlit on a local server 📊
- Deploying web-app on Heroku ⚛️
All the code for this web-app is available in my GitHub link. https://github.com/Heet6498/CricketViz
Let's get started! 👨💻
Step-1: Data scraping from ESPNCricinfo
To build a better tool, we need data. Sure we can find some data from 3rd party website and we can use that. However, we will create our own data from the ESPNCricinfo website.
ESPNCricinfo is a well know website for Cricket related stats data. This website contains data about all cricket match formates. For our web-app, we need data that contains all teams, their match results, their toss results, their batting order, and in which year they played all these matches.
Here the table contains fields such as Teams, Results, Margin, BR, Toss, Bat, Opposition team, Ground, and Year.
To scrape this data we will use BeautifulSoup. We will create our cricket scrapper in a Python environment.
- Installing all the necessary libraries.
- Scrapper
First, we are parsing this URL through BeautifulSoup. Here there is a total of 178 pages available. So to automate this scraper we are defining min_page and max_page. The whole table’s structure is divided into various lengths. From len(10) to len(15).
- Converting dataframe into csv
Step-2: Data cleaning using SQLite and Python
Here, the scrapped data have many jumbled rows. To make the data more understandable, we will use the SQLite tool. Here I am using DB browser as my SQLite tool. We will apply the queries to scrapped data.
Not that we have more tidy data, we can find those missing values and arrange them using pandas.
Here we can see that fields such as results, margin, br, toss, and bat have many null values.
before replacing null values, first, we will covet some categorical strings values into numerical values.
Now we converted all string values into integer values.
Here, we fill in all the null values.
Now we have clean data. We can proceed to streamlit web-app.
Step-3: Producing visualizations using streamlit on a local server
Streamlit is an open-source Python library that helps to create and assign attractive, custom web-apps for machine learning and data science. In just a few moments you can build and deploy powerful web apps.
First, we need to import all the necessary libraries to our environment.
Now that we have all the required libraries, we can start our building process. Let’s define our web-app name and load our clean dataset.
In our web-app we want the user to select which teams he/she wants to do the analysis on. For that, we will use multi-select functionality from streamlit. It allows users to select more than one unique data point at the same time.
As shown in the figure below, we can create any type of multi-select as we want. The dropdown has all the unique data points from our clean data.
Now, we will define the side-bar and it will allow the user to modify the axes of plots.
For our web-app, we will use a count plot and it requires X-axis and Hue. So, for our count plot, we will define the first two options for the x-axis and Hue, and then all other options for the personal customization filter. To view this filtered data, we will define one option to display the data.
Now that we have all the multi-select bars and slide bars, Let’s define our count plot. Here In our web-app, there are a total of 6 unique useful columns so that we will create 6 different count plots for the personal filer. As we apply new filers for each column, the data frame changes. So, we need to create a plot function that displays all the data frames.
To pass all the data frames at once, we need to create a list that contains all the filtered data frames. There will be two lists, one for data frames and one for plot titles. After defining lists, we will connect our sidebar's filters to the data frames and append them to the list.
Let’s call our final function:
For a clean web-app layout, we will hide the default footer and the default right stream bar and apply some background color. Streamlit has built-in functionality for HTML.
It’s all done now. Now we can see our web-app on the local server.
Step-4: Deploying web-app on Heroku
To deploy the web-app on Heroku, first, we have to create three necessary files.
- requirements.txt
- setup.sh
- Procfile
This requiremets.txt file contains all the required libraries to run all our code. This way Heroku will know which libraries are required to run our web-app on an online server.
setup.sh:
Procfile:
Now keep all the main files in one branch in your Github repo. Connect your branch to Heroku and press the deploy button.
Conclusion:
Streamlit is a very useful and fast service when it comes to creating interactive web applications. In this article, we created one end-to-end data visualization web-app. We saw that how easy it is to define all filer features and how fast is it to deploy the web-app using Heroku.
Future scope:
We can create some machine learning models to predict the resulting cricket match outcomes given all these filters and add predicting results to this web-app.