Basic scatter plot: Visualising country data

This programme uses pandas to import a csv into a dataframe, narrow it down to the two inputs we wish to analyse, and visualising these inputs in a scatter plot.

My favorite correlation so far is human development and gender equality (as measured by the UN’s Human Development Index and Gender Equality Index):

Screen Shot 2017-06-18 at 12.43.01 PM

Who would’ve thought more equality might mean better development…

The datasets used for this analysis are unfortunately not public. They cover a wide variety of indicators for each country in the study.

A larger version of this programme takes in multiple csvs and joins them by country for analysis.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv(‘Government.csv’, index_col = 0, skiprows = 1)

#Cleaning dataframe to make it work for user inputs (the original csvs had multiple header lines and divided data columns by 2016 and 2017 values)

for col in df.columns :

if col[len(col) – 2 : ] == ‘.1’ :

df.rename(columns = {col: col[ : len(col) – 2] + ‘2017’}, inplace = True)

if col[len(col) – 2 : ] != ‘.1’ :

df.rename(columns = {col: col[ : len(col)] + ‘2016’}, inplace = True)

df = df.drop(‘Country’)

input1 = input(‘x axis: ‘)

input2 = input(‘y axis: ‘)

#Print dataframe being plotted

print df[[input1, input2]]

x = df[input1]

y = df[input2]

plt.scatter(x, y, marker = ‘.’, c = ‘green’, alpha = 0.5)

#Invert y axis to correctly read the Gender Equality Index, which has 0 as its best score





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s