A post where I make animated graphs and maps to visualize the repartition of Nobel laureates per country.
The Nobel laureates of 2020 were announced last week, and I thought it would be interesting to visualize the repartition of laureates per country, as there are several ways to do so. I’m going to use this dataset available on Kaggle, which contains information on the year, category, name of the laureate, country, city and date of birth and death, among other things. Notice that this dataset goes from 1901 to 2016 and therefore doesn’t contain the most recent laureates.
But first of all, we need to load all the packages we will use in this analysis:
Now, we can import the dataset. To remove the capital letters and transform the column names in snake case (i.e names such as “column_name” instead of “Column Name”), we can use the function clean_names()
of the package {janitor}
1:
nobel_laureates_raw <- read_csv(here("_posts/2020-10-18-nobel-laureates/nobel-laureates.csv")) %>%
janitor::clean_names()
The first thing that we have to correct before doing visualization concerns the country names. Indeed, many countries have changed since 1901. For example, Czechoslovakia no longer exists, as well as Prussia. In this dataset, the columns containing country names display first the official name at the time, and then put the current name of the country between brackets.
# A tibble: 6 x 2
birth_country death_country
<chr> <chr>
1 Netherlands Germany
2 France France
3 Prussia (Poland) Germany
4 Switzerland Switzerland
5 France France
6 Prussia (Germany) Germany
Since we only want the current country names, we must modify these columns so that:
if the name doesn’t have brackets (i.e the country hasn’t changed in time), we let it as-is;
if the name has brackets (i.e the country has changed), we only want to keep the name between brackets.
Since I must do this for two columns (birth_country
and death_country
), I created a function (and this was the perfect example of losing way too much time by making a function to save time…):
This function takes a dataset (data
), and creates a new column (x
) that will take the name between brackets if original variable has brackets, or the unique name if the original variable doesn’t have brackets. Then, x
is renamed as the variable we specified first. I must admit that regular expressions (such as the one in gsub()
) continue to be a big mystery for me, and I thank StackOverflow for providing many examples.
Now, we apply this function to our columns with countries:
nobel_laureates <- clean_country_names(nobel_laureates_raw, birth_country)
nobel_laureates <- clean_country_names(nobel_laureates, death_country)
The country names are now cleaned:
# A tibble: 6 x 2
birth_country death_country
<chr> <chr>
1 Netherlands Germany
2 France France
3 Poland Germany
4 Switzerland Switzerland
5 France France
6 Germany Germany
From now on, there are several ways to visualize the repartition of Nobel laureates per country. We could do a static bar plot, an animated bar plot to see the evolution in time, a static map, or an interactive map.
First of all, we need to compute the number of Nobel laureates per country:
nobel_per_country <- nobel_laureates %>%
select(birth_country, full_name) %>%
distinct() %>%
group_by(birth_country) %>%
count(sort = TRUE) %>%
ungroup() %>%
drop_na()
Then we can plot this number, only for the first 20 countries (so that the plot can be readable):
nobel_per_country %>%
select(birth_country, n) %>%
top_n(20) %>%
mutate(birth_country = reorder(birth_country, n)) %>%
ggplot(aes(x = birth_country, y = n)) +
geom_col() +
coord_flip() +
xlab("Country") +
ylab("") +
geom_text(aes(label = n), nudge_y = 10) +
ggthemes::theme_clean()
We can also check the repartition per country and per category:
# The 20 countries with the most nobels
top_20 <- nobel_per_country %>%
top_n(10) %>%
select(birth_country) %>%
unlist(use.names = FALSE)
nobel_laureates %>%
select(birth_country, full_name, category) %>%
distinct() %>%
group_by(birth_country, category) %>%
mutate(n = n()) %>%
ungroup() %>%
drop_na() %>%
select(- full_name) %>%
distinct() %>%
filter(birth_country %in% top_20) %>%
ggplot(aes(x = birth_country, y = n)) +
geom_col() +
coord_flip() +
xlab("Country") +
ylab("") +
geom_text(aes(label = n), nudge_y = 10) +
ggthemes::theme_clean() +
facet_wrap(~category)
To observe the evolution of this number in time, one way would be to plot lines with year
in x-axis. But we could also keep the first plot we made and animate it with {gganimate}
.
First, we compute the cumulated sum of Nobel laureates. Indeed, the number of laureates per year is useless for us, we want to see the evolution of the total number:
nobel_per_country_year <- nobel_laureates %>%
select(year, birth_country) %>%
group_by(year, birth_country) %>%
count(sort = TRUE) %>%
ungroup() %>%
drop_na() %>%
arrange(birth_country, year) %>%
complete(year, birth_country) %>%
mutate(n = ifelse(is.na(n), 0, n),
year = as.integer(year)) %>%
filter(birth_country %in% top_20) %>%
group_by(birth_country) %>%
mutate(n_cumul = cumsum(n)) %>%
arrange(birth_country)
Then, we use almost the same code as for the first plot, but we add arguments at the end that tell how we want the animation to be:
plot_evol <- nobel_per_country_year %>%
select(birth_country, year, n_cumul) %>%
filter((year %% 2) != 0) %>%
ggplot(aes(x = reorder(birth_country, n_cumul), y = n_cumul)) +
geom_col() +
coord_flip() +
xlab("Country") +
ylab("") +
geom_text(aes(label = as.character(round(n_cumul, 0))), nudge_y = 10) +
ggthemes::theme_clean() +
transition_time(year) +
ggtitle("Year: {frame_time}") +
ease_aes('linear', interval = 2)
animate(plot_evol, duration = 15, fps = 20)
This allows us to see that the USA have seen their number of Nobel laureates surge from the 1960’s and 1970’s, which corresponds more or less to the creation of the so-called “Nobel Prize in Economics” in 1969. The plot per category also indicates that this prize plays a major role in the domination of the USA.
To create maps, we rely on the package {tmap}
. In addition to its functions, this package also gives access to a dataset that we will use to show the number of laureates per country.
data(World)
We need to merge our dataset of Nobel laureates with this dataset. But the country names differ. Therefore, we have to use ISO codes instead. World
already contains ISO codes, so we only have to create those for our dataset. This can be done very easily with the package {countrycode}
. However, some countries in our dataset don’t have ISO codes, such as Scotland, Northern Ireland or Czechoslovakia. The two former can be recoded as United Kingdom, but Czechoslovakia was located on current Slovakia, Czech Republic and Ukraine, so we drop it of our dataset.
nobel_per_country <- nobel_per_country %>%
mutate(
iso_birth = countrycode(birth_country, origin = "country.name", destination = "iso3c"),
iso_birth = case_when(
birth_country == "Scotland" | birth_country == "Northern Ireland" ~ "GBR",
TRUE ~ iso_birth
)
)
We can now merge the two datasets based on their ISO codes…
… and we can build the map and fill the countries with the number of laureates:
Finally, we will make interactive maps with {echarts4r}
. Firstly, let’s make an identical map as the one above but with a few interactive features.
{echarts4r}
uses specific country names, so we use once again {countrycode}
to modify the names in our dataset.
nobel_per_country_echarts <- e_country_names(data = nobel_per_country,
input = iso_birth,
type = "iso3c")
Now we can plot the map:
nobel_per_country_echarts %>%
e_charts(iso_birth) %>%
e_map(n, roam = TRUE) %>%
e_visual_map(max = max(nobel_per_country_echarts$n))
Hovering the countries gives us their name, and the number of laureates in the legend. We can also zoom in and out. We could see the evolution of laureates in time with timeline = TRUE
:
nobel_per_country_year_map <- nobel_laureates %>%
select(year, birth_country) %>%
group_by(year, birth_country) %>%
count(sort = TRUE) %>%
ungroup() %>%
drop_na() %>%
arrange(birth_country, year) %>%
complete(year, birth_country) %>%
mutate(n = ifelse(is.na(n), 0, n),
year = as.integer(year)) %>%
group_by(birth_country) %>%
mutate(n_cumul = cumsum(n)) %>%
arrange(birth_country)
nobel_per_country_year_map <- nobel_per_country_year_map %>%
mutate(
iso_birth = countrycode(birth_country, origin = "country.name", destination = "iso3c"),
iso_birth = case_when(
birth_country == "Scotland" | birth_country == "Northern Ireland" ~ "GBR",
TRUE ~ iso_birth
)
)
nobel_per_country_year_echarts <- e_country_names(data = nobel_per_country_year_map,
input = iso_birth,
type = "iso3c")
nobel_per_country_year_echarts %>%
group_by(year) %>%
e_charts(iso_birth, timeline = TRUE) %>%
e_map(n_cumul, roam = TRUE) %>%
e_visual_map(max = 257) %>%
e_timeline_opts(
playInterval = 250,
symbol = "none"
)
And that’s it! I used data about Nobel laureates to present a few plots and maps made with {ggplot2}
, {gganimate}
, {tmap}
, and {echarts4r}
. I used these packages but there are countless ways to make plots or maps, whether static or interactive, with R:
plots: base R, {highcharter}
, {charter}
, {plotly}
, etc.
maps: base R, {leaflet}
, {sf}
, {ggmap}
, etc.
I hope you enjoyed it!
This is my session info, so that you can see the versions of packages used. This is useful if the results in my post are no longer reproducible because packages changed. The packages with a star (*) are those explicitely called in the script.
─ Session info ─────────────────────────────────────────────────────
setting value
version R version 4.0.3 (2020-10-10)
os Ubuntu 18.04.5 LTS
system x86_64, linux-gnu
ui X11
language en
collate fr_FR.UTF-8
ctype fr_FR.UTF-8
tz Europe/Paris
date 2021-01-20
─ Packages ─────────────────────────────────────────────────────────
package * version date lib
abind 1.4-5 2016-07-21 [1]
assertthat 0.2.1 2019-03-21 [1]
backports 1.2.1 2020-12-09 [1]
base64enc 0.1-3 2015-07-28 [1]
broom 0.7.3 2020-12-16 [1]
cellranger 1.1.0 2016-07-27 [1]
class 7.3-17 2020-04-26 [4]
classInt 0.4-3 2020-04-07 [1]
cli 2.2.0 2020-11-20 [1]
codetools 0.2-16 2018-12-24 [4]
colorspace 2.0-0 2020-11-11 [1]
countrycode * 1.2.0 2020-05-22 [1]
crayon 1.3.4 2017-09-16 [1]
crosstalk 1.1.1 2021-01-12 [1]
DBI 1.1.1 2021-01-15 [1]
dbplyr 2.0.0 2020-11-03 [1]
dichromat 2.0-0 2013-01-24 [1]
digest 0.6.27 2020-10-24 [1]
distill 1.2 2021-01-13 [1]
downlit 0.2.1 2020-11-04 [1]
dplyr * 1.0.3 2021-01-15 [1]
e1071 1.7-4 2020-10-14 [1]
echarts4r * 0.3.4 2020-10-29 [1]
ellipsis 0.3.1 2020-05-15 [1]
evaluate 0.14 2019-05-28 [1]
fansi 0.4.2 2021-01-15 [1]
farver 2.0.3 2020-01-16 [1]
fastmap 1.0.1 2019-10-08 [1]
forcats * 0.5.0 2020-03-01 [1]
fs 1.5.0 2020-07-31 [1]
generics 0.1.0 2020-10-31 [1]
gganimate * 1.0.7 2020-10-15 [1]
ggplot2 * 3.3.3 2020-12-30 [1]
ggthemes * 4.2.0 2019-05-13 [1]
gifski 0.8.6 2018-09-28 [1]
glue 1.4.2 2020-08-27 [1]
gtable 0.3.0 2019-03-25 [1]
haven 2.3.1 2020-06-01 [1]
here * 1.0.1 2020-12-13 [1]
hms 1.0.0 2021-01-13 [1]
htmltools 0.5.1 2021-01-12 [1]
htmlwidgets 1.5.3 2020-12-10 [1]
httpuv 1.5.5 2021-01-13 [1]
httr 1.4.2 2020-07-20 [1]
janitor * 2.1.0 2021-01-05 [1]
jsonlite 1.7.2 2020-12-09 [1]
KernSmooth 2.23-17 2020-04-26 [4]
knitr 1.30 2020-09-22 [1]
labeling 0.4.2 2020-10-20 [1]
later 1.1.0.1 2020-06-05 [1]
lattice 0.20-41 2020-04-02 [4]
leafem 0.1.3 2020-07-26 [1]
leaflet 2.0.4.1 2021-01-07 [1]
leafsync 0.1.0 2019-03-05 [1]
lifecycle 0.2.0 2020-03-06 [1]
lubridate 1.7.9.2 2020-11-13 [1]
lwgeom 0.2-5 2020-06-12 [1]
magrittr 2.0.1 2020-11-17 [1]
mime 0.9 2020-02-04 [1]
modelr 0.1.8 2020-05-19 [1]
munsell 0.5.0 2018-06-12 [1]
pillar 1.4.7 2020-11-20 [1]
pkgconfig 2.0.3 2019-09-22 [1]
png 0.1-7 2013-12-03 [1]
prettyunits 1.1.1 2020-01-24 [1]
progress 1.2.2 2019-05-16 [1]
promises 1.1.1 2020-06-09 [1]
purrr * 0.3.4 2020-04-17 [1]
R6 2.5.0 2020-10-28 [1]
raster 3.4-5 2020-11-14 [1]
RColorBrewer 1.1-2 2014-12-07 [1]
Rcpp 1.0.6 2021-01-15 [1]
readr * 1.4.0 2020-10-05 [1]
readxl 1.3.1 2019-03-13 [1]
reprex 0.3.0 2019-05-16 [1]
rlang 0.4.10 2020-12-30 [1]
rmarkdown 2.6.4 2021-01-19 [1]
rprojroot 2.0.2 2020-11-15 [1]
rstudioapi 0.13 2020-11-12 [1]
rvest 0.3.6 2020-07-25 [1]
scales 1.1.1 2020-05-11 [1]
sessioninfo 1.1.1 2018-11-05 [1]
sf * 0.9-6 2020-09-13 [1]
shiny 1.5.0 2020-06-23 [1]
snakecase 0.11.0 2019-05-25 [1]
sp 1.4-5 2021-01-10 [1]
stars 0.4-3 2020-07-08 [1]
stringi 1.5.3 2020-09-09 [1]
stringr * 1.4.0 2019-02-10 [1]
tibble * 3.0.5 2021-01-15 [1]
tidyr * 1.1.2 2020-08-27 [1]
tidyselect 1.1.0 2020-05-11 [1]
tidyverse * 1.3.0 2019-11-21 [1]
tmap * 3.2 2020-09-15 [1]
tmaptools 3.1 2020-07-01 [1]
tweenr 1.0.1 2018-12-14 [1]
units 0.6-7 2020-06-13 [1]
utf8 1.1.4 2018-05-24 [1]
vctrs 0.3.6 2020-12-17 [1]
viridisLite 0.3.0 2018-02-01 [1]
withr 2.4.0 2021-01-16 [1]
xfun 0.20 2021-01-06 [1]
XML 3.99-0.5 2020-07-23 [1]
xml2 1.3.2 2020-04-23 [1]
xtable 1.8-4 2019-04-21 [1]
yaml 2.2.1 2020-02-01 [1]
source
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
Github (JohnCoene/echarts4r@082e62c)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.2)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.1)
CRAN (R 4.0.0)
CRAN (R 4.0.2)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.1)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
Github (rstudio/rmarkdown@2e8572e)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.2)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.0)
CRAN (R 4.0.2)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.0)
CRAN (R 4.0.1)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.3)
CRAN (R 4.0.3)
CRAN (R 4.0.2)
CRAN (R 4.0.3)
CRAN (R 4.0.0)
CRAN (R 4.0.0)
[1] /home/etienne/R/x86_64-pc-linux-gnu-library/4.0
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
This function is very useful even when column names are much more messy.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/etiennebacher/personal_website_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bacher (2020, Oct. 18). Etienne Bacher: Visualize data on Nobel laureates per country. Retrieved from https://www.etiennebacher.com/posts/2020-10-18-nobel-laureates/
BibTeX citation
@misc{bacher2020visualize, author = {Bacher, Etienne}, title = {Etienne Bacher: Visualize data on Nobel laureates per country}, url = {https://www.etiennebacher.com/posts/2020-10-18-nobel-laureates/}, year = {2020} }