In 2018, a Swedish physician, Hans Rosling, and his children released a book titled: "Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think." The book changed the way I think about the future by pointing out some faulty assumptions I was making about the modern world. I made this notebook out of inspiration from Rosling's book.

The data came from Gapminder and can be accessed here.

# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Setting up the data

Before we get started, it's important to set up the data properly. Don't worry about understanding this section and feel free to skip it if you'd like.

# Loading the data
income_per_person = pd.read_csv('data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv')
income_per_person.head()
country 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040
0 Afghanistan 603 603 603 603 603 603 603 603 603 ... 2420 2470 2520 2580 2640 2700 2760 2820 2880 2940
1 Albania 667 667 667 667 667 668 668 668 668 ... 18500 18900 19300 19700 20200 20600 21100 21500 22000 22500
2 Algeria 715 716 717 718 719 720 721 722 723 ... 15600 15900 16300 16700 17000 17400 17800 18200 18600 19000
3 Andorra 1200 1200 1200 1200 1210 1210 1210 1210 1220 ... 73200 74800 76400 78100 79900 81600 83400 85300 87200 89100
4 Angola 618 620 623 626 628 631 634 637 640 ... 6270 6410 6550 6700 6850 7000 7150 7310 7470 7640

5 rows × 242 columns

The above table is the format in which the data came in. In order to visualize it the way I'd like, I will write a function to melt the date columns into one column. I use this function several times throughout the project.

Melt Function

def melt_data(df, value_name):
    """ 
    Parameters: 
        df (DataFrame): DataFrame to melt
        value_name: Values of melt
    Returns: 
        melt_df: Melted DataFrame
    """
    melt_df = df.melt(id_vars='country', var_name="Year", value_name=value_name)
    melt_df.Year = pd.to_datetime(melt_df.Year).dt.year
    return melt_df

Below is what the data looks like after melting it using our new function.

income_melt = melt_data(income_per_person, "Income")
income_melt.head()
country Year Income
0 Afghanistan 1800 603
1 Albania 1800 667
2 Algeria 1800 715
3 Andorra 1800 1200
4 Angola 1800 618

Plot Function

Now I need a function that will help me look at the melted datasets. The function below will do just that.

def make_lineplot(data, yval, title):
    """ 
    Parameters: 
        data (DataFrame): DataFrame to melt
        yval: y-axis variable
        title: Title of the plot
    Returns: 
        melt_df: Melted DataFrame
    """
    plt.figure(figsize=(12,6))
    sns.lineplot(x='Year', y=yval, data=data)
    plt.title(title, fontsize=20)
    plt.axvline(x=2018, ymin=0,ymax=np.max(data[yval]), color='r')
    plt.show()

Now we're ready to look at the data!

Average Income Per Person

One metric used to measure the world economy is average income per person. This is simply measured by dividing the total income for a country by its number of people. A higher income per person gives the individual more buying power which translates to a higher standard of living.

Below, I look at the average income per person for every country from the years 1800 to 2018, as well as the forecasted average income through the year 2040.

The dark blue line represents the actual data that were collected and the thicker blue highlight represents a 95% confidence interval. The red vertical line represents the year 2018, so everything to the left of the red line is actual data and everything to the right is forecasted. This is true for all the plots in this notebook.

make_lineplot(income_melt, "Income", "Average World Income Per Person")

The plot above shows that people across the world have risen to a much higher standard of living over the last 50 years and are predicted to continue doing so.

But, does this truly represent every country? What if the size and growth of the largest nations in the world are falsely representing the poorer countries? Are poor countries also seeing similar growth?

Let's take a look.

In 2018, Business Insider published an article listing the 28 poorest countries in the world. Below, I filtered the dataset to contain data from only the 28 poorest countries (listed in descending order of poverty ranking).

poorest_countries = ['Sudan', 'Benin', 'Chad', 'Nepal', 'Mali', 'Guinea-Bissau', 'Ethiopia', 'Comoros', 'Tajikistan', 'Haiti', 'Rwanda', 'Guinea', 'Burkina Faso', 'Liberia', 'Uganda', 'Togo', 'Afghanistan', 'Niger', 'Sierra Leone', 'Gambia', 'Madagascar', 'Congo, Dem. Rep.', 'Mozambique', 'Yemen', 'Central African Republic', 'Malawi', 'Burundi', 'South Sudan']

Below is the same lineplot as above, but this time only including those 28 countries.

data=income_melt[income_melt.country.isin(poorest_countries)]
make_lineplot(data, "Income", "Average Income Per Person for 28 Poorest Countries")

As you can see, even the poorest countries have been rising to a higher standard of living and are projected to continue doing so (just on a smaller scale). Their growth is slow, but they're moving in the right direction.

Doubling a 1,000 dollar income in 1950 only gives you 2,000 dollar income in 2018 which may not seem like a lot, but this growth rate scales very fast.

Life Expectancy

With a higher standard of living comes better health. Next, I look into the average life expectancy for each country from 1800 to 2018. Take a look at the plot, and see if you can figure out why there are two significant drops.

life_expectancy = pd.read_csv('data/life_expectancy_years.csv')
expectancy_melt = melt_data(life_expectancy, "Life_Expectancy").dropna()
make_lineplot(expectancy_melt, "Life_Expectancy", "Average Life Expectancy")

(If you guessed the drops are from the wars, you are correct).

Post 1950, average life expectancy skyrocketed, but before 1950, the average person could've expected to live to be about 40 yrs old. Crazy, right?

Today, it seems normal to meet elders in their 70s, but less than a century ago, they would have been an outlier. The below plot shows two distrubutions of life expectancy for the years 1918 and 2018 -- the change over a century!

plt.figure(figsize=(12,6))
sns.distplot(expectancy_melt[expectancy_melt.Year==1918].Life_Expectancy, color='r', label='1918')
sns.distplot(expectancy_melt[expectancy_melt.Year==2018].Life_Expectancy, color='g', label='2018')
plt.title("Average Life Expectancy Over the Last Century", fontsize=20)
plt.legend()
plt.show()

Looking at the distribution for 1918, only people in the far right tail lived above the age of 60 yrs. Eye-balling it, this looks like less than 5% of the population.

Just one hundred years later, these numbers are almost flipped and now it's much more common to live past 60 yrs than to die before it.

In the grand scope of things, 100 years is not a lot of time. In fact, it's only about three generations (depending on how you calculate the average age of a generation). In other words, your expected age has doubled since your great grandma was born. That's a massive improvement that's often over-looked.

Child Mortality

Another metric to measure the standard of living is the child mortality rate of children aged 0-5 yrs for every 1,000 births. Until the year 1890, around 39% of children between the ages 0-5 yrs were expected to die. Think about that.

Today, that number is less than 5%. Again, another major improvement and a sign that we are definitely moving in the right direction.

child_mortality = pd.read_csv('data/child_mortality_0_5_year_olds_dying_per_1000_born.csv')
mortality_melt = melt_data(child_mortality, "death_per_1000")
make_lineplot(mortality_melt, "death_per_1000", "Child Mortality Rate (Death by 0-5 year-olds per 1000 born)")

Child Per Woman

Because babies now have a higher chance of survival, families have been having less of them. The logic used to be "let's have 6 and hope 3 of them live." Now, there's less uncertainty.

According to Rosling, poor families in general typically have more children than well off families because they rely on their children to contribute to the family's combined income. As the average income per person increases, however, poorer families tend to have less children since they can generate the same income with fewer people.

child_per_woman = pd.read_csv('data/children_per_woman_total_fertility.csv')
child_per_woman_melt = melt_data(child_per_woman, "child_per_woman")
make_lineplot(child_per_woman_melt, "child_per_woman", "Average Child Per Woman")

Population Growth

Now that we know families are having fewer children, how does that affect the total population growth?

I used to think that the population would grow infinitely, and it kind of makes sense to believe this -- in theory, the more people there are, the more babies they have, and the bigger the population gets.

But actually, this is far from truth. Rosling describes in his book that population growth reflects that of a human being. In the early stages of population development, growth is rapid. But as the world gets older and matures, the population growth begins to plateau, and might eventually even begin to decrease in the future.

population = pd.read_csv('data/population_total.csv')
population_melt = melt_data(population, "population")
make_lineplot(population_melt, "population", "Population Size")

Remarks

In a world of what seems like constant negativity, it's good to acknowledge the good things once in a while. Are we perfect yet? No, far from it. In fact, there are still millions of people living in extreme poverty and harsh conditions around the world which seems unacceptable in today's world. But the point is that we're moving in the right direction and making a ton of progress.

I hope this notebook was eye opening. If you have any comments or feedback on how to improve it, please reach out and let me know. My email is colestriler at gmail . com.

Download notebook

(3 downloads)

Post categories:

economy

health

gapminder

world