World Happiness — a Project in Exploratory Data Analysis

Yinghui (Linda) He
6 min readMar 7, 2021

Topic:

World happiness is an indicator of the state of global happiness. Starting from 2012, world happiness reports have been done to evaluate happiness scores in countries around the world and factors — e.g., economics, psychology, survey analysis, national statistics, health, public policy, and more — that could have potential influences on happiness scores. For example, the economy could affect people’s life quality and thus influence happiness. In this report, I will explore the relationship between factors and happiness and show the strength of the relationship by data visualization.

Dataset:

I extracted the world happiness dataset from Kaggle, with 5 .csv files for world happiness reports from years 2015 to 2019. The dataset contains six factors as columns: economic production, social support, life expectancy, freedom, absence of corruption, and generosity.

Here is an example data of countries’ happiness scores and other factors related (e.g., GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption) in the year 2019:

Top 10 Happiest Countries in the Year 2019

Guiding Questions:

  1. Does a greater GDP correlate to a higher happiness score?
  2. Do all the factors in the original datasets correlate to happiness score? If so, in which way and which one has the strongest correlation relationship?
  3. Does the world average happiness score steadily increase from the year 2015 to the year 2019?

Hypotheses:

  1. I hypothesize that the more prosperous a country is (economy/GDP), the higher the happiness score.
  2. I hypothesize that GDP, social support, healthy life expectancy, freedom to make life choices, and generosity are positively correlated with happiness score in 2019.
  3. I hypothesize that there is a negative correlation between perceptions of corruption is negatively correlated with happiness scores from the year 2017–2019.
  4. I hypothesize that as time goes from 2015 to 2019, the average happiness score increases.
  5. I hypothesize that two strong factors for happiness score (from Hypothesis 2) also increase, from 2015–2019.

Discussion

Data Collection

Data Information Regarding 2019.csv

By taking a look at the datasets, I realized that this was a pretty neat and clear dataset, without potential missing data and mixing data types.

Question 1: Does a greater GDP correlate to a higher happiness score?

Hypothesis 1: I hypothesize that the more prosperous a country is (economy/GDP), the higher the happiness score.

Scatter Plots for Economy Indicator (GDP) and Happiness Scores in Years 2015–2019

By the scatter plots above regarding the relationship between economy (especially GDP per capita) and happiness score, we can tell that there is a relatively strong positive relationship between the two variables, from the year 2015 to the year 2019. The detailed correlation coefficient can be measured using a Pearson R test. Thus, hypothesis 1 is supported with the given data.

Question 2: Do all the factors in the original datasets correlate to happiness score? If so, in which way and which one has the strongest correlation relationship?

Hypothesis 2: I hypothesize that GDP, social support, healthy life expectancy, freedom to make life choices, and generosity are positively correlated with happiness score in 2019.

Relationship between Possible Positive Factors of Happiness Score in the Year 2019

From the above five plots, we can see relationships between those five factors (GDP, social support, healthy life expectancy, freedom to make life choices, and generosity) and happiness scores, with different strengths.

From the visualization, it is illustrated that economy (GDP per capita), social support, and healthy life expectancy have great positive correlation strength with happiness scores of countries. Freedom to make life choices also has a positive relationship with happiness but a weaker one. However, generosity seems to have no relationship with happiness.

So, hypothesis 2 is supported for three factors — including economic production, social support, and life expectancy — with strong positive relationships, and the factor freedom with a weak positive relationship. However, there’s no relationship between generosity and happiness. Thus, hypothesis 2 is only partially supported.

Hypothesis 3: I hypothesize that there is a negative correlation between perceptions of corruption is negatively correlated with happiness scores from the year 2017–2019.

Relationship between Perceptions of Corruption and Happiness Score in Years 2017–2019

With the above data visualization for the relationship between perceptions of corruption and happiness, we can see the relationship is not negative, but weak positive, with most data having levels of low perceptions of corruption.

I originally hypothesized a negative relationship between the two because I misunderstood the variable perceptions of corruption — a higher score for more perception of corruption. The positive result gave me a second chance to look at the data in more detail, and I found out that this indicator may refer to the absence of corruption, though not stated clearly. If this is the case, the result looks more reasonable to me, since the logic for me is that fewer perceptions of corruption should lead to greater happiness.

Thus, hypothesis 3 is not supported with related data from 2017–2019.

Question 3: Does the world average happiness score steadily increase from the year 2015 to the year 2019?

Hypothesis 4: I hypothesize that as time goes from 2015 to 2019, the average happiness score increases.

Line Plot for Average World Happiness in Years 2015–2019

The average world happiness score is not increasing readily from the year 2015 to the year 2019, but instead, it drops in the year 2017 and then increases again. Thus, hypothesis 4 is not supported with the given data.

Further discoveries of the changes in average world happiness could be explored in data with a longer time frame.

Hypothesis 5: I hypothesize that two strong factors for happiness score (from Hypothesis 2) also increase, from 2015–2019.

Line Plots for Average Economy (GDP) and Healthy Life Expectancy in Years 2015–2019

From hypothesis 2 conclusion, we know that economy and healthy life expectancy both have a strong relationship with happiness wcore. However, from the above line plots, we can tell that both the average economy and the average healthy life expectancy graph do not have a steadily increasing feature. The economy (GDP) indicator first increases, then decreases, and then increases a little bit; the healthy life expectancy first decreases and then increases. Thus, hypothesis 5 is not supported by the above data analysis.

This could be caused by no clear tendency in the changes in happiness. Besides, it could also be interpreted as that those factors together make an influence on the happiness score and one factor alone cannot decide.

Since hypothesis 4 is also not supported, we could not conclude a tendency for the changes in happiness and certain factors. This could be caused by the reason that time frame is relatively short (only data for 5 years). An analysis with a longer time range could help solve this problem, such as a 20-year-long data from 2000–2020.

Summary

  1. Hypothesis 1 is supported. The more prosperous a country is (economy/GDP), the higher the happiness score.
  2. Hypothesis 2 is partially supported. Factors economic production, social support, and life expectancy have strong positive relationships with happiness scores. The factor freedom has a weak positive relationship. However, there’s no relationship between generosity and happiness.
  3. Hypothesis 3 is not supported. The relationship between perceptions of the absence of corruption is not negatively correlated with happiness.
  4. Hypothesis 4 is not supported with 5 years of data. There is no clear pattern for the change of happiness in 2015–2019.
  5. Hypothesis 5 is not supported. One factor alone cannot decide the happiness and exploration with a longer time frame could help with the further investigation for potential trends.

Github repo:

https://github.com/Yinghui-HE/World-Happiness-EDA-Project/

--

--