Banned in China

How Censorship in China Allows Government Criticism but Silences Collective Expression (pdf) is a mostly brilliant paper that persuasively demonstrates its title thesis. I thank Xavier Marqeuz for the G+ post that alerted me to it.

Internet censorship in mainland China has at least three layers: IP blocking (the Great Firewall), keyword interception, and human intervention.

The bluntest of the instruments is IP blocking: for example, Facebook, Twitter, WordPress, and BlogSpot are all generally blocked. Keyword interception is the most technically sophisticated layer: real-time packet filtering, particularly affecting search engines, is widely reported. Censorship by human intervention takes place on a large scale and seems mainly to affect websites hosted in mainland China, which are most easily controlled by authorities.

IP blocking on its own would be ineffective because readers can still access a large number of substitute websites. Chinese orthography limits the effectiveness of keyword interception: since manifold homophones and homographs exist, people who can type in Chinese dance around this form of censorship and even make sport of it. This forces authorities back to their last line of defence: blocking offending content by hand. The paper gives the strong impression that this massive logistical exercise is very efficient and very accurate: most hand-blocked content is censored within one day of appearing; this remains true even during bursts of social media activity on a given subject; and even at this speed, censorship is applied with greater precision than the categorization of content by theme generated by the ReadMe package, which the paper’s authors use to analyse censorship as it happens. (Gary King, a co-author of the paper, is also one of the fathers of ReadMe. That package is a monumental achievement in the automation of text analysis; this paper is just one impressive demonstration of its power.)

The paper focuses on censorship by human intervention and asks the interesting question “what subject matter is most likely to be censored?”. Methods employed are technically sophisticated and intuitively appealing. Chinese social media sites are scraped and posts are grouped into topics by ReadMe. Eighty-five preselected topics with varying degrees of political sensitivity are singled out for particular attention. Posts are periodically rescraped to see if they have been censored. This approach has the potential to bring empirical evidence to bear on questions that have previously been soluble only by Kremlinology-reminiscent reading of tea leaves.

Persuasive evidence for the following propositions is offered.

  • Baseline censorship probability for social media posts is fairly high: in the region of ten percent.
  • Sensitive topics are more likely to be censored, but with just two exceptions (pornography and criticism of censors), not by a huge amount.
  • The censorship operation is a well-oiled machine.

Two hypotheses are tested.

  1. Material critical of the government is the principal target of censorship.
  2. Material with potential to generate localized collective action (such as protests) is the principal target.

Hypothesis (1), perhaps surprisingly, does not stand up to the empirical evidence presented. Harsh and even vitriolic criticism of the Chinese government does not get censored a great deal above baseline unless it also has (or is part of a spike of activity with) potential to generate localized collective action. Material with high potential to generate localized collective action, on the other hand, tends to get censored even if the posts themselves support the government’s positions. The paper persuasively demonstrates that hypothesis (2) is correct and hypothesis (1) is wrong.

Localized collective action and mass protests can be harbingers of regime change in nondemocratic jurisdictions (cf Eastern Europe circa 1990, and the “Arab Spring” last year). It therefore makes some sense that this is what Chinese authorities seek to prevent. There is also an ideological interpretation available that the paper does not fully explore: it is fairly orthodox communism to treat socialism itself (identified with the state as the avatar of socialism) as the only authentic collective action in society. Collective action not controlled by the state is false consciousness at best, and unhelpful factionalism or outright counterrevolutionary at worst.

The authors puzzle over why criticism of censors is at all times singled out for those censors’ scissors, and suggest this is inappropriately self-serving behaviour by the censorship bureaucracy. I do not see the mystery here, and submit that a policy subjecting this material to heightened scrutiny is consistent with authorities’ policy goal of reducing potential for localized collective action. The censorship programme is a line of defence against inappropriate collective action: if the programme is itself delegitimized by collective action, then floodgates may be opened to collective action on other matters.

I would be interested to see follow-up work that more closely addresses the importance of collective action being localized rather than geographically disparate. This is a really interesting point, and hopefully the data the authors plan to release will itself be able to answer questions left standing here if someone finds an appropriate fulcrum from which to apply leverage.

I would also like to see if criticism of named government officials might receive different censor treatment than criticism of government or its policies in general or particular.

Section 5 of the paper, which seeks to show that changes in censorship activity and spikes of material with high potential for localized collective action (these endpoints, while independently interesting, are convincingly identified) can be detected in real time, is not persuasive. This is why I only described the paper as “mostly” brilliant.

The authors look at three spikes of social media activity of interest, and detect changes in censor activity in the days immediately prior (ie in the time when the government but neither media nor public are aware of imminent risk of collective action). They fail to provide a genuine test, which to be meaningfully applied in real time, would need to identify all changes in censor activity and separate out (without a priori or a posteriori knowledge of the material’s collective action potential) that which is of practical import and that which is statistical fluctuation. I make an analogy to medical screening: it is not enough to show that people who eventually got cancer had pre-cancer markers; to create a useful test, one needs to show that pre-cancer markers are usually followed by cancer. This would require refinement and expansion of the methods employed in the paper, which I suspect would take technical innovation on top of the paper at least as great as that within it.

With the exception of some confusing labels on figures, the paper is highly polished and very accessible. It uses technology to bring a novel source of data to bear on questions that were previously not vulnerable to empirical attack. It answers the key question of what material is the principal focus of China’s censorship authorities. This in turn permits insight into what China’s authorities regard as threats to their power. For these reasons, the paper is interesting, important, and recommended.