Capstone Project for Data Visualization Nanodegree

Context: I’m putting up a blog post that I wrote as a part of the capstone project for my Data Visualization nanodegree at Udacity.

Hans Rosling – the Swedish doctor and self-taught data visualization expert – better known for his TED talks – calls out an interesting example of what he calls the size instinct in his book Factfullness. The size instinct, in his own words, is, “to look at a lonely number and misjudge its importance.” And to that effect he gives an example of an environment minister from a European country who stood up at the World Economic Forum in Davos in 2017 and stated that China and India were accelerating the pace of climate change with their emissions.

“The forecasts show that it is China, India and other emerging economies that are increasing their carbon monoxide emissions at a speed that will cause dangerous climate change. In fact, China already emits more CO2 than the USA, and India already emits more than Germany.” The developed world and high-income countries often put the onus of combating climate change on developing countries using this argument. The visualization presented below tell a different story, one that is based on per capita emissions and not on emissions at an aggregate level – a fact that is often lost in the us-versus-them rhetoric.


Step 1

As a requirement of the project, I have chosen this visualization from #MakeoverMonday. The source of the data is the World Bank and as a part of the dataset there are additional parameters which are ignored in the visualization. Other issues and challenges with the visualization are:

  1. It doesn’t consider parameters such as the GDP per capita, and the population.
  2. It doesn’t have a filter that lets the viewer select the country or the region.
  3. A fairly limited set of countries is displayed on the chart, making it difficult for a viewer to arrive at insights based on categories or logical groupings. While the search bar at the top of the page can be used to display additional countries, the chart itself remains cluttered.
  4. While there are three different chart types included in the visualization, there’s no story line that emerges from it.
  5. There’s very limited use of supporting text and annotations to aid and assist the viewer.


Step 3

Through the alternate visualizations I have created, using a different version of the same dataset, I attempt to make a case for the developed world and high-income countries to do more when it comes to climate change. In addition to this endeavor, I have also attempted to identify if there is a correlation between the GDP per capita and the emissions per capita over time.


Step 4

Here is a link to a story I have created in Tableau, that uses scatter plots and encodes additional information around region using color, and population using size. This is meant to be a tribute to Hans Rosling and his extensive use of scatter plot animation to show that the world today isn’t as bad as people think it is.


Step 5

This visualization improves on the original one in a number of ways:

  1. There’s additional information such as the GDP per capita (log scale on the y-axis), population (size) and region (color) encoded.
  2. The viewer can easily determine that there exists a positive correlation between GDP per capita and emissions per capita.
  3. It also enables the viewer to quickly determine that the worst polluters based on emissions per capita are the advances economies of North America and oil producing high-income countries of the Middle East.
  4. The animation enables the viewer to trace the general curve that determines the relationship between the two variables.
  5. The supporting text and annotations make it easy for the viewer to arrive at the insights that attempt to address the problem statement.


Step 2

As a concluding note, I will attempt to highlight the limitations and biases that exist in the dataset that I have used for these visualizations.

  1. The methodology for collection of this data has not been specified, making it difficult to determine its sanctity.
  2. There are features for which the information is missing – completely at random. One of the easiest ways to deal with this would be listwise deletion. This will, however, compromise on the quality of analysis that is generated.
  3. There are outliers for both key features – GDP per capita and the emissions per capita. The y-axis with GDP per capita has been capped at 200K, and x-axis with the emissions per capita has been capped at 30 metric tons.
  4. The dataset includes only carbon dioxide emissions – which constitutes only 82% of all emissions.
  5. It is also important to note the correlation does not imply causation and that no statistical tests have been performed as a part of this analysis.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s