Which countries are mentioned the most on Hacker News?
As I was walking around Osaka with Sacha, we started talking about Hacker News and how there seem to be a lot of popular stories about Japan.
Is the popularity of Japan just our confirmation bias, or would it really top the lists of most popular countries on HN? Since the data is public, I decided to find out.
Most mentioned countries
Here are the top countries which had at least 100 stories mentioning them. This only counts news stories posted, not comments.
Rank | Country | Story mentions |
---|---|---|
# 1 | United States | 18164 |
# 2 | China | 8610 |
# 3 | India | 7671 |
# 4 | United Kingdom | 6774 |
# 5 | Japan | 3029 |
# 6 | Russia | 2722 |
# 7 | Canada | 2248 |
# 8 | Germany | 2248 |
# 9 | Australia | 2188 |
# 10 | France | 2153 |
# 11 | Israel | 1096 |
# 12 | Spain | 891 |
# 13 | Brazil | 889 |
# 14 | Jordan | 878 |
# 15 | Pakistan | 850 |
# 16 | Netherlands | 812 |
# 17 | Sweden | 808 |
# 18 | Greece | 767 |
# 19 | Italy | 744 |
# 20 | Ireland | 688 |
# 21 | Mexico | 682 |
# 22 | Switzerland | 639 |
# 23 | Singapore | 608 |
# 24 | Turkey | 604 |
# 25 | Ukraine | 587 |
# 26 | Egypt | 536 |
# 27 | Malaysia | 443 |
# 28 | Norway | 434 |
# 29 | Indonesia | 428 |
# 30 | Vietnam | 386 |
# 31 | Philippines | 356 |
# 32 | Chile | 342 |
# 33 | Thailand | 337 |
# 34 | Finland | 334 |
# 35 | Argentina | 328 |
# 36 | Afghanistan | 327 |
# 37 | Nigeria | 326 |
# 38 | Iraq | 307 |
# 39 | Saudi Arabia | 293 |
# 40 | Georgia | 278 |
# 41 | Poland | 273 |
# 42 | Iceland | 257 |
# 43 | Denmark | 233 |
# 44 | Kenya | 219 |
# 45 | Estonia | 214 |
# 46 | Nepal | 209 |
# 47 | Taiwan | 195 |
# 48 | Portugal | 193 |
# 49 | Haiti | 183 |
# 50 | Libya | 165 |
# 51 | Belgium | 156 |
# 52 | Romania | 152 |
# 53 | Venezuela | 152 |
# 54 | Ecuador | 142 |
# 55 | Antarctica | 126 |
# 56 | Bangladesh | 123 |
# 57 | Cyprus | 120 |
# 58 | Hungary | 117 |
# 59 | Austria | 111 |
Average score of post vs. country
Now we know which countries are mentioned the most, but how about upvotes? Which countries have the highest average number of upvotes per post? For this I only include the countries mentioned in the previous list, since including rarely mentioned countries would have added noise.
Rank | Country | Story mentions | Average score |
---|---|---|---|
# 1 | Ecuador | 142 | 22.94 |
# 2 | Norway | 434 | 15.72 |
# 3 | Germany | 2248 | 14.86 |
# 4 | Austria | 111 | 14.32 |
# 5 | Sweden | 808 | 14.11 |
# 6 | Iceland | 257 | 14.04 |
# 7 | Venezuela | 152 | 13.33 |
# 8 | United States | 18164 | 13.01 |
# 9 | Denmark | 233 | 12.84 |
# 10 | Finland | 334 | 12.73 |
# 11 | Netherlands | 812 | 12.58 |
# 12 | Switzerland | 639 | 12.51 |
# 13 | Japan | 3029 | 11.91 |
# 14 | Chile | 342 | 11.42 |
# 15 | Russia | 2722 | 11.11 |
# 16 | Libya | 165 | 10.95 |
# 17 | Poland | 273 | 10.58 |
# 18 | France | 2153 | 10.57 |
# 19 | Kenya | 219 | 10.25 |
# 20 | Afghanistan | 327 | 10.14 |
# 21 | Romania | 152 | 10.02 |
# 22 | Georgia | 278 | 9.88 |
# 23 | Haiti | 183 | 9.73 |
# 24 | Iraq | 307 | 9.61 |
# 25 | Estonia | 214 | 9.57 |
# 26 | Mexico | 682 | 9.42 |
# 27 | Argentina | 328 | 9.39 |
# 28 | Greece | 767 | 9.33 |
# 29 | Hungary | 117 | 9.21 |
# 30 | Turkey | 604 | 9.14 |
# 31 | Brazil | 889 | 9.08 |
# 32 | Egypt | 536 | 8.96 |
# 33 | Saudi Arabia | 293 | 8.73 |
# 34 | United Kingdom | 6774 | 8.72 |
# 35 | Malaysia | 443 | 8.7 |
# 36 | Antarctica | 126 | 8.56 |
# 37 | China | 8610 | 8.55 |
# 38 | Cyprus | 120 | 8.38 |
# 39 | Canada | 2248 | 8.26 |
# 40 | Nigeria | 326 | 8.06 |
# 41 | Nepal | 209 | 7.72 |
# 42 | Singapore | 608 | 7.6 |
# 43 | Italy | 744 | 7.45 |
# 44 | Belgium | 156 | 7.37 |
# 45 | Australia | 2188 | 7.07 |
# 46 | Portugal | 193 | 7.07 |
# 47 | Israel | 1096 | 6.99 |
# 48 | Ireland | 688 | 6.79 |
# 49 | Thailand | 337 | 6.67 |
# 50 | Ukraine | 587 | 6.48 |
# 51 | Bangladesh | 123 | 6.48 |
# 52 | Spain | 891 | 6.46 |
# 53 | India | 7671 | 6.1 |
# 54 | Taiwan | 195 | 5.24 |
# 55 | Vietnam | 386 | 4.92 |
# 56 | Pakistan | 850 | 4.65 |
# 57 | Philippines | 356 | 4.38 |
# 58 | Indonesia | 428 | 2.67 |
# 59 | Jordan | 878 | 2.3 |
Conclusions
What is going on with Ecuador? Snowden. Here are the stories mentioning it.
Japan really does have both very high popularity in submissions and also fairly good success with those posts as well. I did notice India and China appearing often in stories, but didn't expect them to beat Japan.
Notes on method
I thought this blog post would be a 2-hour project. The data is readily available, you can either download all HN stories in JSON format (1.1GB uncompressed) or use Google BigQuery, which has the table available already. I went with the latter route.
At first I thought I would just match stories against country strings like "Germany" and "Japan", but then I realized I should probably include "German" and "Japanese" in there as well. For multi-word country names I also wanted to count abbreviations ("US", "U.S.", "USA", "U.S.A."). The query already removes dots, so it was enough to have "US", "USA" in the list. To prevent "US" from also matching "us", I decided to stick with case sensitivity.
In the end to determine whether a story might be talking about a certain country, I made a list of strings which map to country codes. In the list I have both country names ("Japan") and demonyms ("Japanese"). Here is my whole mapping.
I decided not to include "English" as a word for UK, because it more commonly refers to the language. Since it was ambiguous what "Korea" appearing along would refer to, I didn't count that as referring to anything.
In the end I spent two evenings creating the country synonym list, reading up on how JOINs and subselects work on BigQuery (finally ending up with this query) and then composing the final post along with maps and formatting. I already had the country flags, as I was using them for the geoip part of Candy Japan.
Hope you liked it!
I also made one for Reddit.