Services
By Technology

Data Engineering

Azure

AWS

GCP

Big Data

Data Warehousing

Business Intelligence

Power BI

Tableau

AI & Machine Learning

Artificial Intelligence

Natural Language Processing

Vision Analytics

Large Language Model (LLM)

By Engagement Model

Resource Augmentation

Managed Analytics

AI Product Development

Global Capability Center (GCC)

Build Operate Transfer (BOT)

By Use Case

Digital Transformation

AI for Insurance

Boutique Analytics

Ecommerce Analytics

Manufacturing Analytics

Digital Lending

Manufacturing OEE Analytics

HR Analytics

Sales Analytics

Procurement Analytics

IT JIRA Tracking
Products

PrepAI

A one-stop solution for making tests easy for educators, and ed-tech businesses

Marketing Cockpit

Find the Blind Spots in your Marketing Performance

HireLake AI

Parse and Match Resume Data With Job Description in Bulk

DataToBiz CV Platform

Test and experience Computer Vision Implementation

Tally BI

Transform Tally Data to Power BI with Ease

Virtual Try On Platform

Reimagine Digital Catalogues with Virtual TryOn

Sensibly

Make your retail outlet efficient and successful with data-driven insights
Resources

Blogs

Latest blogs, news and
updates!

Case Studies

Collaboration driving
business impact

Whitepapers

Hands-on industry trends, insights and real-world collabs

FAQs

Your Data Questions,
Answered!

Know Your Data Readiness
Industries
Company

Careers

Join us. Be a part of something great.

About Us

All about the story, vision, and team behind the Biz

Partner Program

Start your affiliate journey & earn big

Services
By Technology

Data Engineering

Azure

AWS

GCP

Big Data

Data Warehousing

Business Intelligence

Power BI

Tableau

AI & Machine Learning

Artificial Intelligence

Natural Language Processing

Vision Analytics

Large Language Model (LLM)

By Engagement Model

Resource Augmentation

Managed Analytics

AI Product Development

Global Capability Center (GCC)

Build Operate Transfer (BOT)

By Use Case

Digital Transformation

AI for Insurance

Boutique Analytics

Ecommerce Analytics

Manufacturing Analytics

Digital Lending

Manufacturing OEE Analytics

HR Analytics

Sales Analytics

Procurement Analytics

IT JIRA Tracking
Products

PrepAI

A one-stop solution for making tests easy for educators, and ed-tech businesses

Marketing Cockpit

Find the Blind Spots in your Marketing Performance

HireLake AI

Parse and Match Resume Data With Job Description in Bulk

DataToBiz CV Platform

Test and experience Computer Vision Implementation

Tally BI

Transform Tally Data to Power BI with Ease

Virtual Try On Platform

Reimagine Digital Catalogues with Virtual TryOn

Sensibly

Make your retail outlet efficient and successful with data-driven insights
Resources

Blogs

Latest blogs, news and
updates!

Case Studies

Collaboration driving
business impact

Whitepapers

Hands-on industry trends, insights and real-world collabs

FAQs

Your Data Questions,
Answered!

Know Your Data Readiness
Industries
Company

Careers

Join us. Be a part of something great.

About Us

All about the story, vision, and team behind the Biz

Partner Program

Start your affiliate journey & earn big

Hire Developers

Request a Quote

Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for beginners

blog image

Sulaksh More
May 30, 2020
801339

Introduction

The Corona Virus – COVID-19 outbreak has brought the whole world to a stand still position, with complete lock-down in several countries. Salute! To every health and security professional. Today, we will attempt to perform a single data analysis with COVID-19 Dataset Using Python. Here’s the link for Data Set available on Kaggle. Following are the the Python Libraries we’ll be implementing today for this exercise.

Pandas: Open Source Python Library that allows us to practice various tools for data analysis. Majorly used for Data Analysis and Manipulation.
Seaborn: Another Python Library for Data Visualization, based on Matplotlib. Provides a wide range of Graphics for presentation purpose.
Matplotlib: Python Library for multi-platform Data Visualization. Widely used for creating, manipulating and plotting interactive visualizations.

What Data Does It Hold

The available dataset has details of number of cases for COVID-19, on daily basis. Let us begin with understanding the columns and what they represent. Column Description for the Dataset:

Sno: Serial Number.
ObservationDate: Date of Observation in mm/dd/yyyy format.
Province/State: Province or State of the case.
Country/Region: Country or region of the case.
Last Update: UTC time format for when was the row updated.
Confirmed: Cumulative number of confirmed cases
Deaths: Cumulative number of deaths cases
Recovered: Cumulative number of recovered cases

These are the columns within the file, most of our work will working around three columns which are Confirmed, Deaths and Recovered.

Let Us Begin: Firstly, we’ll import our first library, pandas and read the source file.

import pandas as pd
df = pd.read_csv("covid_19_data.csv")

Now that we have read the data, let us print the head of the file, which will print top five rows with columns.

df.head()

As you can see in the above screenshot, we have printed the top five rows of the data file, with the columns explained earlier.

Let us now get into some dept of the data, where we can understand the mean and standard deviation of the data, along with other factors.

df.describe()

Describe function in pandas is used to return the basic details of the data, statistically.

We have our mean, which is “1972.956586” for confirmed cases and Standard Deviation is “10807.777684” for confirmed cases. Mean and Standard Deviation for Deaths and Recovered columns is listed, too.

Let us now begin with plotting the data, which means to plot these data points on graph or histogram. We used pandas library until now, we’ll need to import the other two libraries and proceed.

import seaborn as sns
import matplotlib.pyplot as plt

We now have imported all three libraries. We will now attempt to plot our data on a graph and output will reflect figure with three data points on a graph and their movements towards the latest date.

plt.figure(figsize = (12,8)) df.groupby('ObservationDate').mean()['Confirmed'].plot() df.groupby('ObservationDate').mean()['Recovered'].plot() df.groupby('ObservationDate').mean()['Deaths'].plot()

Code Explanation: plt.figure with initial the plot with mentioned width and height. figsize is used to define the size of the figure, it takes two float numbers as parameters, which are width and height in inches. If parameters not provided, default will be scParams, [6.4, 4.8].

Then we have grouped Observation Data column with three different columns, which are Confirmed, Recovered and Deaths. Observation goes horizontal along with the vertical count.

Above code will plot the three columns one by one and the output after execution will be as shown in following image.

This data reflects the impact of COVID-19 over the globe, distributed in three columns. Using the same data, we can implement prediction models but the data is quite uncertain and does not qualify for prediction purpose. Moving on we will focus on India as Country and analyze the data,

Country Focus: India

Let us specifically check the data for India.

ind = df[df['Country/Region'] == 'India']
ind.head()

Above lines of code will filter out columns with India as Country/Region and place those columns in “ind” and upon checking for the head(), it will reflect the top five columns. Check the below attached screenshot.

Let’s plot the data for India:

plt.figure(figsize = (12,8))
ind.groupby('ObservationDate').mean()['Confirmed'].plot()
ind.groupby('ObservationDate').mean()['Recovered'].plot()
ind.groupby('ObservationDate').mean()['Deaths'].plot()

Similar to earlier example, this code will return a figure with the columns plotted on the figure. Output for above code will be:

This is how Data is represented graphically, making it easy to read and understand. Moving forward, we will implement a Satterplot using Seaborn library. Our next figure will place data points, with respect to sex of the patient.

Code: Firstly we’ll make some minor changes in variables.

df['sex'] = df['sex'].replace(to_replace = 'male', value = 'Male')
df['sex'] = df['sex'].replace(to_replace = 'female', value = 'Female')

Above code simply changes the variable names to standard format. Then we’ll fill the data points into the figure, plotting.

plt.figure(figsize = (15,8))
sns.scatterplot(x = 'longitude', y = 'latitude', data = df2, hue = 'sex', alpha = 0.2)

Code Explanation: The “x and y” defines the longitude and latitude. data defines the data frame or the source, where columns and rows are variables and observations, respectively. The hue defines the variable names in the data and here these variables will be produced with different colors. alpha, which takes float value decides the opacity for the points. Refer the below attached screenshot for proper output.

Future Scope: Now that we have understood how to read raw data and present it in readable figures, here the future scope could be implementing a Time Series Forecasting Module and getting a Prediction. Using RNN, we could achieve a possibly realistic number of future cases for COVID-19. But at present, it could be difficult to get realistic prediction as the data we posses now is too uncertain and too less.

But considering the current situation and the fight we have been giving, we have decided not to implement Prediction Module to acquire any number which could lead to unnecessary unrest. Contact us for any business query

Sulaksh More

9 Comments

Enrica Dannie Klemens says:

January 30, 2021 at 1:12 am

This post is truly a nice one.

Log in to Reply
Abby Davin Trevar says:

January 30, 2021 at 6:03 pm

I blog frequently and I truly appreciate your content. The article has really piqued my interest.

Log in to Reply
Penny Reggis Laurentia says:

February 6, 2021 at 1:11 pm

I am amazed with the research you have made for this article. Fantastic job!

Log in to Reply
Justina Hugo Orella says:

February 7, 2021 at 2:09 pm

Very insightful article. Definitely sharing it.

Log in to Reply
Lorna Sterling McGean says:

February 7, 2021 at 4:44 pm

As I website owner I think the subject matter here is very excellent, thanks for your efforts.

Log in to Reply
Bonnibelle Dennet says:

February 9, 2021 at 1:55 pm

Remarkable! It’s a truly remarkable article, I have got a much clear idea from this article.

Log in to Reply
Bevvy Rollin Odin says:

February 9, 2021 at 11:07 pm

What an amazing blog!

Log in to Reply
Olga Rancell Seligman says:

February 10, 2021 at 4:15 am

Hi there. I discovered your website by way of Google whilst looking for a comparable subject, your web site got here up. I have bookmarked it in my google bookmarks to come back then.

Log in to Reply
Renie Frederigo Jepson says:

February 10, 2021 at 8:28 am

I was reading through some of your articles on this website and I believe this internet site is real informative!

Log in to Reply

Leave a Reply Cancel reply

You must be logged in to post a comment.

Services

Data Engineering

Data Warehousing

AI & Machine Learning

Artificial Intelligence

Natural Language Processing

Vision Analytics

Large Language Model (LLM)

Business Intelligence

DataToBiz is a Data Science, AI, and BI Consulting Firm that helps Startups, SMBs and Enterprises achieve their future vision of sustainable growth.

Hire Developers

Use Cases

Digital Transformation

AI Insurance Automation

Boutique Analytics

Ecommerce Analytics

Manufacturing Analytics

Digital Lending Automation

Sales Analytics

Procurement Analytics

IT JIRA tracking

Products

Marketing Cockpit

Virtual Try On Platform

DataToBiz CV Platform

Quick Links

Partner Program

DataToBiz is a Data Science, AI, and BI Consulting Firm that helps Startups, SMBs and Enterprises achieve their future vision of sustainable growth.

Hire Developers

Top Big Data & BI Company on GoodFirms

DMCA.com Protection Status