top of page
All Posts: Blog2

[DataViz] 4.3 Project I- Start & Web Scraping

Updated: Feb 8, 2021

The early times of RoK were like a series of dominos or entanglement of explosives, as you would call it. Each unfortunate circumstance or incident initiated another, and there seemed to be no way to pull the nation out of this pandemonium. This is one of the reasons why I have so much respect for Korean independence fighter Dongjin Kwon, whom I am proud to be a direct descendant of. But to this nation that he so much endeared, the 4.3 incident had done so much harm.

The 4.3 incident, which happened in 1948, is basically a massacre that happened due to the struggle between the two ideologies in Korea after the Korean war. Although it resulted in the greatest number of casualties in Korean history except the Korean war, this in fact, is an incident that has still not been resolved. It is one of the incidents led by the futile and authoritarian regime (it stepped down in the past but still has ideological sympathizers) that refuses to apologize for its past misdeeds.

While numerous effort has been put in to maintain the struggle to receive proper compensation, the 4.3 incident is getting lost in people's memories, as generations pass. But what I believe is important is that people who know this tragedy publicize this tragedy to demand proper compensation and prevent future aggression.

To this end, I decided to apply data visualization, one of my talents, to speak to my community and society using my own unique medium and voice.

I first began by planning what the exhibition would look like. By drawing some ideas and getting lots of inspiration from Hard Data by R. Luke DuBois, I came up with some basic ideas like below: the names and areas of residence will be printed along with the map with the place where the victim is from getting redder every time.

Ultimately, at the end, we would get a geospatial graph, where the "redness" is proportional to the number of victims living in the area.

People often misconstrue data that it is hard and cold, because they only see bar charts and pie charts that smother all the different narratives contained in the data. But I wanted to tell them the opposite, that data can be humanizing and convey stories, and those boring bar charts are not the only way to visualize data.

If there were like 40 killed in the incident, I could just manually type them in, but there were 14232 killed, so to collect 14232 data, I developed a web scraping and web crawling code.

I first found this 4.3. memorial website. But in this website, I couldn't find the css needed to call the data that I need. While trying to find another way, I stumbled onto this website, which is the 4.3 archive!

I first inspected the page elements, and used html_nodes to extract the data that I needed. In the last bit, html_nodes(css='.mb30'), I didn't know how to extract only one node when there are two nodes starting with the same name, so that took long to construct.

html <- read_html(url_list[1], encoding='UTF-8')
temp <- unique(html_nodes(html,'#wrap') %>% 
         html_nodes('#containers') %>% 
         html_nodes(css='.content') %>%
         html_nodes(css='.content_body') %>%
         .[[1]] %>% html_nodes(css='.cnt_table4') %>% html_table(page, fill = TRUE)) %>%

As the data here is very raw and unpolished, I will post about tidying the data in the next post.


Recent Posts

See All
bottom of page