When the long-form Census cancellation story first broke in June 2010, an astute fellow (ahem) observed the move would negatively impact provincial governments, community groups and other organizations that previously relied on its data. He noted: “It will be a disaster. A lot of policy across Canada has been based on that long form.”
That the data for smaller geographic areas wouldn’t be reliable enough to publish was anticipated by most people with a basic grasp of stats. Unfortunately, that nowhere-near-exclusive group excluded then-Industry Minister Tony Clement – along with every member of his party, apparently.
The first hint came with the initial 2011 NHS release, when the census tract data wasn’t published along with the census subdivision data, as it had been in 2006. The CT data was subsequently released. However, both the CSD and CT data reliability were misrepresented as StatsCan failed to draw attention to the fact it had doubled the acceptable global non-response rate (from 25 to 50%) – effectively lowering the bar to allow for the data to be published at all.
Among the concerns with the data being too unreliable to release was the possibility private firms would take advantage of / seek to profit off Canadians’ collective misfortune. Enter Environics Analytics.
To start, the company’s ad suggests the lack of an official release of the 2011 NHS DA-level data was unique, and that the data was not available from StatsCan – neither of which is accurate.
While it did officially release it in 2008, StatsCan never made the 2006 (20%) Census cumulative profile that includes DA-level data publicly available (see 94-581-X200602). Despite that same dataset being freely available through certain channels – we have a copy, in case readers are interested – StatsCan has advised those requesting that it’s only available on a significant ‘cost-recovery’ basis.
While StatsCan did not ‘officially’ release a 2011 NHS cumulative profile that includes DA-level data, last summer university librarians advised the agency was again offering to sell the data to those requesting it on a significant ‘cost recovery’ basis. As if it wasn’t enough that the data was garbage, users were still being hit up for access to it. Like the 2006 (20%) Census, the 2011 NHS DA-level data is likewise freely available through certain channels – we have an ‘unofficial’ copy, though will not be sharing it with readers.
Not because there’s profit to be made off of it, but because it’s garbage data. As it did with the already released CSD and CT data, StatsCan’s doubling of the global non-response rate masks how unreliable the 2011 NHS DA-level data is relative to prior long-form Census. An impressive 85% of DAs (47,746 / 56,204) met the significantly lowered data quality threshold (50% GNR). However, with the pre-2011 NHS standard (25% GNR) only 25% of DAs (14,110 / 56,204) would have been fit for release.
That’s the same data Environics would be providing potential clients through its ‘CensusPlus’ – but with extra tabs using variables from previous short and long form Census to impute (guesstimate) the missing data. While that may fill in the gaps, it’s no assurance the data will be any more accurate. (And yes, it’s just as likely to make the already inaccurate data even more so – remember the GIGO rule.)
That may be good enough for some profitable businesses in terms of cost-effectiveness. But it would be tough for community organisations and charities to justify digging into their limited financial resources to pay for unreliable data. Yet without reliable data many of those organisations could face losing future funding from larger umbrella groups (Centraide / United Way, for example) and provincial social agencies for failing to provide reliable estimates of the impact they have on the clients and communities they serve.
It’s a lose-lose for just about everyone. Except perhaps a federal government with a low-information, ideologically-driven policy agenda. And shady agencies looking to profit off Canadians’ collective misfortune.