National standards: this data is not for ranking

Comparing individual schools by comparing their national standards data is a waste of time. Students are assessed by overall teacher judgement, not standardized testing. Getting some sensible moderation in place is one of the biggest challenges in the implementation of the standards, but people need to accept that moderation will never be as rigorous as it is under standardized testing regimes. For reasons I’ll touch on later, this is by design.

A school’s national standards data is not useful information for a parent choosing which school to pack a kid off to. (Given zoning and the high fixed costs of moving, there is limited scope for school choice in the first place, so that might be a moot point.) Far better is knowledge of what a school does with its data.

If I was at the Ministry of Education (or—heaven forbid!—a parent), here’s what I’d be looking for.

I’d look for signs that schools are using the standards sensibly on a student level. Especially for students not at standard, I’d want to see individualized learning plans, with achievable benchmarks/milestones. Ideally, these plans would be designed in and as a collaboration between teacher, student, and caregiver. Give the student a sense of direction and ownership: here’s what we want you to be able to do, here’s our plan for getting you there, and here’s how you’ll be able to feel your own progress along the way.

Over time, I’d look for signs that schools are using standards, in conjunction with the learning plans, to do some value-added appraisal of teachers. I would incorporate this formally into professional development structures. I accept that there’s only so much a teacher can be reasonably expected to do for kids who turn up hungry, have caregivers with significant reading difficulties, or who switch between schools a lot. (These are, incidentally, things that are thought to correlate pretty tightly with decile.) Placing appropriate weight on factors like this is something that I think standards will be able to do over time, even if they’re fairly messy.

These things don’t really need tightly moderated standards. If comparison between schools is what you’re really wanting, then standardized tests would produce more useful data. But standardized testing was defeated politically on grounds I think were actually quite sensible.

Standardized tests are just not a good cultural fit with New Zealand. There’s a national egalitarian streak: we don’t like putting anyone higher or lower than anyone else, and we especially don’t like it when it’s kids we’re sorting. We don’t like to pressure children academically to the point of causing stress. We worry about labelling effects on kids who are at risk of falling behind: effects that are real, powerful, long-lasting, and can be very destructive. These traits of ours are admirable, and saying “little Timmy got 35 percent in his Year 2 reading test” is offensive to just about all of them.

Besides, the literature on standardized tests is a minefield. As an example, unless they’re made meaningful to students (ie have consequences) then there’s not much to suggest they help much at all with anything. But the point of the New Zealand variant on standards is that they’re snapshots of where students are at and tools for planning. They’re not meant to be consequential for students. This next point is important so I’m going to italicize it: most students probably don’t even know when their teacher is assessing them against the standards.

Where cross-school comparisons might come in useful is in identifying stand-out schools so that successful and/or innovative practices can be, where appropriate, replicated more widely. Obviously, what works in a school within one particular cultural, social, and economic context won’t necessarily work for a school that’s in a totally different one. You do your data analysis and your case studies, then you devise sensible categories and work within them.

So long as moderation is at least better than hopeless, in time, we’ll also learn quite a bit more than we already know about the impact of social and economic factors on academic attainment during the early years of school. This is important, because it’s these early years that are assumed to matter most. National standards data, for all its flaws, is or can be made rich enough to support meaningful research that will help us improve how we teach our children.

Depending on whether or not I can be bothered, I might write up some stuff on why I don’t like deciles as analytical tools. But not this week.


National Standards

Familiarize yourself with the standards themselves (reading and writing [pdf], mathematics [pdf]). Notice that assessment against the standards is done by overall teacher judgement and not entirely, or even principally, by standardized testing.

If you want to play with the shiny data set, Luis A Apiolaza has done you a favour by putting it into something sensible and R-friendly. I recommend reading all of the case studies on Stuff before diving in. Context matters.

My own prejudices. I am a policy analyst with nil expertise in education. I support publication of national standards data. I am agnostic as to whether national standards should have been introduced in the first place, but furious at those “public servants” who sought to obstruct implementation of a lawful Government policy: no integrity.

I’m mainly interested in what the standards data can and can’t tell us, and how they might be used to improve education outcomes. The lens I’m trying to look through is how would I tackle this if I were at the Ministry of Education?

Thank you 2Degrees!

I just had an awesome experience with the 2Degrees Mobile. I botched a top-up procedure, and accidentally chewed through $10 of mobile broadband credit in about three to five minutes.

So I put my mobile broadband SIM into my phone and called the helpline. I was fully prepared to accept that I’d lost the credit: it was, after all, my ineptitude that saw it fly away. But they gave me $10 of credit straight back, and put me directly onto a good value data pack.

I’ve had a not-great day. This experience just made it a little better.

Concern trolling

An open letter to @Megan_Woods

Dear Dr Woods

I understand you’re pretty new in town, so let me explain how things work. When someone like you from the hinterlands first comes to Wellington all starry eyed and ambitious and desirous to change the world, more experienced caucus colleagues, if they have your best interests at heart, will give you some version of the talk. This is not intended to be a pleasant conversation.

Typically, someone who’s seen a dozen brighter stars flame out will gently explain your responsibilities over the coming three years (summary: work hard for your electorate constituents, and other than that, say nothing at all to anybody under any circumstances ever) and provide a brief introduction to the concept of a single-term MP. Judging from your tweet last night…

RT @Megan_Woods Hitler has a pretty clear manifesto that he campaigned and won on. Question: does this make what he did ok @NZNationalParty? #SaveOurAssets

First-term MP for Wigram, covering herself in glory.

…you either did not listen to the talk or your colleagues didn’t care enough about you to make you sit down and shut up through it. This was a poor start to your career in Parliament.

I have no desire to correct your trampling all over human misery for petty partisan ends by taking umbrage and trampling all over human misery for petty partisan ends. I am not easily offended, and in truth your tweet did not upset me personally.

But, for the love of all that is good and holy in this world, I urge you to stop saying stupid things, or, failing that, to stop saying anything at all.

In fewer than 140 characters, you conceded your own party’s central contention on the government’s proposed programme of partial SOE floats, and then godwinned yourself out of reasonable discourse. I believe I am being objective when I use the word stupid to describe this performance.

There are reasonable objections to the government’s intended course of action, and there are objections that are not reasonable.

Early criticism about foregoing future dividend streams was misguided. Nobody has provided any evidence to suggest that the government will not recoup, on average, the market’s weighted estimate of this stream’s present value at IPO. If you believe you have a better estimate of the true value of these dividends, then I suggest you participate in the IPO and in the sharemarket in the days and years following. If you’re right, you’ll get rich. Use the money to spend on whatever you think the Government should have been spending it on instead. Seriously: you’ll be rich enough to do this.

I guess it’s not your fault your superiors later opted for the silly line that “the Government has no mandate”. Putting aside that this policy simply could not have taken the electorate by surprise – it was announced nearly a year before the general election and your party made opposition to it a central campaign plank – the Government, along with its confidence and supply partners, can command a majority in the House of Representatives. That is what a mandate to govern looks like in Westminster democracies. To put this into terms relevent to your current position: it’s why you are an Opposition backbench MP and not a Government backbench MP.

I accept that Parliamentary majorities can and do pass manifestly unjust laws, and that sometimes, as you allude, these laws can be monstrous and evil. Putting aside that this is obviously not the case in this instance (partial floats are, at worst, unwise, and nobody is suggesting the mass murder is part of the policy package at issue here), you have managed to miss a crucial point. The government has never argued that the partial floats are a good idea because it has a mandate to govern. The government thinks that they’re a good idea anyway, and that its mandate merely weakens moralistic-sounding arguments opposing the peaceful policy. Not unreasonable.

In truth, mandate talk bores me. Most New Zealanders oppose this piece of government policy, but clearly a large potion of those opposing did not, taking all other matters into account when it mattered on 26 November 2011, care about it so much as to alter their votes. Democracy seems to be working as advertised. From my point of view, it’s positively a good thing about this Government that it seems determined to teach voters the meaning of buyer’s remorse. It’s a valuable lesson.

Finally, since I’ve dismissed some your party’s standard arguments against partial SOE floats and not even deigned to comment much about others (notably the outright racist invocation of the foreign-ownership bogey generally in the same breath as racist hectoring over the private Crafar farm sales), I think it’s only fair that I offer a suite of arguments against the policy that I think are more reasonable. Your mileage on these, like mine, may vary.

  1. The Government should consider more carefully alternative means of raising or releasing capital. Options include increasing tax, raising debt through bond issues, or reducing outlays in other areas of government spending. There are others. (I presume for now the option you prefer is part of a broader and coherent package for the management of the government’s books, but this isn’t something I’m prepared to take on faith for much longer.) In the present global economic environment, some options are more easily defensible than others, but none of the avenues I have mentioned are no-hopers in economic terms.
  2. The government shouldn’t be reducing its exposure to the energy industry. The Government has contended that its programme is an exercise in shifting the capital assets side of its balance sheet away from energy and towards “social assets” like schools, hospitals, and somewhat strangely, roads and fibre-optic cables. You may wish to argue that the state’s current level of exposure to energy is ideal. I’m not sure how you’d do this without the conversation drifting back towards the unenlightening dividend-stream argument, but you’re welcome to go ahead and try.
  3. The Government will misallocate the capital it releases. I totally agree. Roads, fibre-optic cables, schools, and hospitals are very nice, but it’s not at all clear to me that they’re necessarily the best use of government capital. I’d like to see more evidence, and a higher standard of business cases offered.
  4. A partial float will not do much to improve the firms’ standard of governance and profitability. This argument, made often and well by economist Paul Walker, fundamentally undermines the Government’s claims about the partial floats’ potential benefits to the New Zealand economy. As an added bonus for your team, it is more or less a slam dunk. Walker’s other objections to the Government’s policy are also strong.
  5. The state’s interests, shareholders’ interests, and the public interest may not be well aligned. Can we really trust governments, present and future, to let these companies fail if they’re mismanaged into loss land? Can we really trust them not to hobble potential competition from new and innovative market entrants? Can we really trust them not to saddle the companies with majority shareholder directives aimed not at increasing profits but at scratching some other political itch?

See how I did that? I raised, in an honest way, reasons I’m not wild about the government’s programme. My arguments didn’t resort to partisan hackery, and didn’t resort to wildly inappropriate historical analogies. It’s possible. You should try it one day. After your first term. Until then, you should probably just watch how the grown-ups in your caucus do it.

With less disrespect than the above would suggest,

Update: I wrote this last night and schedules. Between the time of writing and the time of publication, Dr Woods apologized gracefully. Bravo.

Banned in China

How Censorship in China Allows Government Criticism but Silences Collective Expression (pdf) is a mostly brilliant paper that persuasively demonstrates its title thesis. I thank Xavier Marqeuz for the G+ post that alerted me to it.

Internet censorship in mainland China has at least three layers: IP blocking (the Great Firewall), keyword interception, and human intervention.

The bluntest of the instruments is IP blocking: for example, Facebook, Twitter, WordPress, and BlogSpot are all generally blocked. Keyword interception is the most technically sophisticated layer: real-time packet filtering, particularly affecting search engines, is widely reported. Censorship by human intervention takes place on a large scale and seems mainly to affect websites hosted in mainland China, which are most easily controlled by authorities.

IP blocking on its own would be ineffective because readers can still access a large number of substitute websites. Chinese orthography limits the effectiveness of keyword interception: since manifold homophones and homographs exist, people who can type in Chinese dance around this form of censorship and even make sport of it. This forces authorities back to their last line of defence: blocking offending content by hand. The paper gives the strong impression that this massive logistical exercise is very efficient and very accurate: most hand-blocked content is censored within one day of appearing; this remains true even during bursts of social media activity on a given subject; and even at this speed, censorship is applied with greater precision than the categorization of content by theme generated by the ReadMe package, which the paper’s authors use to analyse censorship as it happens. (Gary King, a co-author of the paper, is also one of the fathers of ReadMe. That package is a monumental achievement in the automation of text analysis; this paper is just one impressive demonstration of its power.)

The paper focuses on censorship by human intervention and asks the interesting question “what subject matter is most likely to be censored?”. Methods employed are technically sophisticated and intuitively appealing. Chinese social media sites are scraped and posts are grouped into topics by ReadMe. Eighty-five preselected topics with varying degrees of political sensitivity are singled out for particular attention. Posts are periodically rescraped to see if they have been censored. This approach has the potential to bring empirical evidence to bear on questions that have previously been soluble only by Kremlinology-reminiscent reading of tea leaves.

Persuasive evidence for the following propositions is offered.

  • Baseline censorship probability for social media posts is fairly high: in the region of ten percent.
  • Sensitive topics are more likely to be censored, but with just two exceptions (pornography and criticism of censors), not by a huge amount.
  • The censorship operation is a well-oiled machine.

Two hypotheses are tested.

  1. Material critical of the government is the principal target of censorship.
  2. Material with potential to generate localized collective action (such as protests) is the principal target.

Hypothesis (1), perhaps surprisingly, does not stand up to the empirical evidence presented. Harsh and even vitriolic criticism of the Chinese government does not get censored a great deal above baseline unless it also has (or is part of a spike of activity with) potential to generate localized collective action. Material with high potential to generate localized collective action, on the other hand, tends to get censored even if the posts themselves support the government’s positions. The paper persuasively demonstrates that hypothesis (2) is correct and hypothesis (1) is wrong.

Localized collective action and mass protests can be harbingers of regime change in nondemocratic jurisdictions (cf Eastern Europe circa 1990, and the “Arab Spring” last year). It therefore makes some sense that this is what Chinese authorities seek to prevent. There is also an ideological interpretation available that the paper does not fully explore: it is fairly orthodox communism to treat socialism itself (identified with the state as the avatar of socialism) as the only authentic collective action in society. Collective action not controlled by the state is false consciousness at best, and unhelpful factionalism or outright counterrevolutionary at worst.

The authors puzzle over why criticism of censors is at all times singled out for those censors’ scissors, and suggest this is inappropriately self-serving behaviour by the censorship bureaucracy. I do not see the mystery here, and submit that a policy subjecting this material to heightened scrutiny is consistent with authorities’ policy goal of reducing potential for localized collective action. The censorship programme is a line of defence against inappropriate collective action: if the programme is itself delegitimized by collective action, then floodgates may be opened to collective action on other matters.

I would be interested to see follow-up work that more closely addresses the importance of collective action being localized rather than geographically disparate. This is a really interesting point, and hopefully the data the authors plan to release will itself be able to answer questions left standing here if someone finds an appropriate fulcrum from which to apply leverage.

I would also like to see if criticism of named government officials might receive different censor treatment than criticism of government or its policies in general or particular.

Section 5 of the paper, which seeks to show that changes in censorship activity and spikes of material with high potential for localized collective action (these endpoints, while independently interesting, are convincingly identified) can be detected in real time, is not persuasive. This is why I only described the paper as “mostly” brilliant.

The authors look at three spikes of social media activity of interest, and detect changes in censor activity in the days immediately prior (ie in the time when the government but neither media nor public are aware of imminent risk of collective action). They fail to provide a genuine test, which to be meaningfully applied in real time, would need to identify all changes in censor activity and separate out (without a priori or a posteriori knowledge of the material’s collective action potential) that which is of practical import and that which is statistical fluctuation. I make an analogy to medical screening: it is not enough to show that people who eventually got cancer had pre-cancer markers; to create a useful test, one needs to show that pre-cancer markers are usually followed by cancer. This would require refinement and expansion of the methods employed in the paper, which I suspect would take technical innovation on top of the paper at least as great as that within it.

With the exception of some confusing labels on figures, the paper is highly polished and very accessible. It uses technology to bring a novel source of data to bear on questions that were previously not vulnerable to empirical attack. It answers the key question of what material is the principal focus of China’s censorship authorities. This in turn permits insight into what China’s authorities regard as threats to their power. For these reasons, the paper is interesting, important, and recommended.

Apocalyptically Dull

Day Four

The Ministry’s Chief Security Officer drafted Business Continuity Plan Gamma back in the days when the contagion that most occupied our colleagues over at Health was some multi drug resistant strain of chlamydia those crazy kids in Fielding were squirting into each other.

That was the military fantasist CSO. He’d take two weeks off every duck season to prove his manhood by obliterating harmless creatures too stupid to know what gunpowder is, let alone appreciate that death is visiting them courtesy of a Cosmi Autoloader just like the one Mussolini shot their brethren with somewhere else in happier times. It’s not clear whose glory was meant to be reflecting on whom.

He got fired ages ago for negging a Graduate Policy Analyst, which goes some way to explaining why we have a BCP Alpha, a BCP Gamma, and a BCP Delta, but no BCP Beta. I guess these things balance out; Information Systems and Strategy insists on calling just about every project they do Something Beta.

I imagine he’s doing well now.

The plan itself was one of those curiosities departments accumulate in their electronic filing that never get deleted and only get read when someone has spent far too long with nothing to do but is aware that internet time is clocked and reported to line managers.

It’s the kind of document that can only have been written by a man who already hates his job. Fair enough. The CSO’s duties usually amount to reading a pile of papers each day and deciding which of two stamps, “IN CONFIDENCE” or “SENSITIVE”, should be applied to each. The only perk is being gatekeeper to all the old Cabinet papers and minutes, so that if anyone in the Ministry wants to know what the hell government policy actually is they can be made to jump through hoops, fill out a form that says leaking is bad, and, above all, wait. Nobody bothers: it’s easier to make up something plausible and say EGI decided it just before Christmas, and less likely to generate awful legislation.

Anyway, a Whole Of Government – we’re not allowed to say WOG – Business Continuity Plan is what the Department of the Prime Minister and Cabinet wanted, so dozens and scores of departmental business continuity plans is what the Department of the Prime Minister and Cabinet got, including our BCPs Alpha, Gamma, and Delta.

After Patient Zero bit a john’s dick and all hell broke loose in Hamilton on Saturday night, I knew that most of my working week would be meetings and briefings on Gamma.

It is, after all, the only BCP in all of Wellington that contemplates a zombie outbreak.

And it sucks.

To be continued…