Hey there, time traveller!
This article was published 17/5/2013 (1300 days ago), so information in it may no longer be current.
CALGARY -- There is much talk today of "big data." As our world goes digital, companies, governments, not-for-profits, and organizations of all sorts obtain vast quantities of data about, well, virtually everything.
But is the data any good? So far, the talk has been about quantity, but not quality. That may be because there is so little quality to speak of.
Take, for example, the first data releases from the National Household Survey of Statistics Canada. Released earlier this month, the quality of the results has come under criticism because the voluntary NHS survey replaced the compulsory long-form census questionnaire. In effect, this replaced a random sample with a non-random sample. Non-random samples have their place, but making conclusions about the population isn't one of them.
As a result, no conclusions about the Canadian population can be drawn from the NHS. Since making these types of conclusions is the whole point of a census, the NHS data is worthless. (This is also true for any survey where participation is voluntary, including citizen, customer and employee satisfaction surveys.)
This is why, in resigning as the head of Statistics Canada, Munir Sheikh wrote in an open letter to the prime minister:
"I want to take this opportunity to comment on a technical statistical issue which has become the subject of media discussion... the question of whether a voluntary survey can become a substitute for a mandatory census... It cannot."
Later, Statistics Canada's high-profile chief economic analyst, Phil Cross, also resigned, citing the same concerns. Cross is currently research co-ordinator with the MacDonald-Laurier Institute.
That's two people resigning over a matter of principle. Refusing to compromise scientific integrity for personal or political gain, are admirable actions. Arguably, a level of integrity is all too rare in the Canadian public service and now, totally absent at Statistics Canada.
The NHS replaces sound scientific sampling and data collection with meaningless motherhood pronouncements. Saying: "The Agency is aware of the risks and associated adverse effects on data quality and is currently adapting its data collection and other procedures to mitigate the impact of these risks" is not science; it's spin.
No amount of public relations nonsense from the Communications Office of Statistics Canada can produce a reliable statistical inference from a discretionary sample.
Nor can mailing more surveys. Statistics Canada's claim that; "To promote data accuracy, this voluntary survey will be sent to a larger cross-section of households than the old long-form census," is not evidence of risk mitigation but of statistical incompetence. Sample size doesn't compensate for sampling bias. Quantity can't replace quality. Statistics Canada's mitigation strategy is nothing more than piling it higher and deeper in the hopes nobody will notice that it's all the same BS.
The news releases accompanying the initial release of the NHS results encourage this confusion between quantity and quality. For example, Statistics Canada claimed a high quality of results for the NHS at a national level, but cautioned that the numbers were less reliable for smaller population centres because of low response rates. This is the reason given for withholding the results of one quarter of Canadian municipalities.
But the truth is, the results at a national level are no more reliable than results for any one of the over 1,000 municipalities that had their results withheld. This is because reliability cannot be measured when the sample isn't random and voluntary surveys aren't, by definition, random (as Sheikh makes clear). In publishing results for larger population areas then, Statistics Canada is claiming reliability where none exists and perpetrating what amounts to a scientific fraud on the Canadian public.
Troy Media columnist Robert Gerst is a partner in charge of operational excellence and research & statistical methods at Converge Consulting Group Inc. He is author of The Performance Improvement Toolkit: The Guide to Knowledge-Based Improvement and numerous articles in peer-reviewed publications.