2011 NHS: Environics Analytics makes critics’ fears reaility, and why you shouldn’t buy ‘CensusPlus’

Thanks to Prof. Murtaza Haider, whose HuffPost write-up drew our attention to this.

When the long-form Census cancellation story first broke in June 2010, an astute fellow (ahem) observed the move would negatively impact provincial governments, community groups and other organizations that previously relied on its data, noting: “It will be a disaster. A lot of policy across Canada has been based on that long form.”

That the data for smaller geographic areas wouldn’t be reliable enough to publish was anticipated by most people with a basic grasp of stats. Unfortunately, that nowhere-near-exclusive group didn’t include then-Industry Minister Tony Clement – or any member of his party, apparently.

The first hint came with the initial 2011 NHS release, when the census tract data wasn’t published along with the census subdivision data, as it had been in 2006. The CT data was subsequently released. However, both the CSD and CT data reliability were misrepresented, as StatsCan failed to draw attention to the fact it had doubled the acceptable global non-response rate (from 25 to 50 percent) – effectively lowering the bar to render the data fit for publication.

Among the concerns raised about the data being unreliable was the possibility private firms would take advantage of / seek to profit off Canadians’ collective misfortune. Enter Environics Analytics.

To start, the company’s ad suggests the lack of an official 2011 NHS DA-level data release was unique, and that the data was not available from StatsCan – neither of which is accurate.

While officially released in 2008, StatsCan didn’t initially make the 2006 long-form (20 percent) Census cumulative profile with DA-level data publicly available (see 94-581-X200602). Despite that same aggregated dataset being freely available through certain channels, StatsCan advised those inquiring that the data was only available on a significant ‘cost-recovery’ basis. We have a copy, available gratis, in case readers are interested.

While StatsCan did not ‘officially’ release a 2011 NHS cumulative profile with DA-level data, last summer university librarians advised the agency was again offering to sell the data to those requesting it on a significant ‘cost recovery’ basis. As if it wasn’t enough that the data was garbage, users were still being hit up for access to it. Like the 2006 long-form Census, the 2011 NHS cumulative profile aggregated data is likewise freely available through certain channels. We have an ‘unofficial’ copy of that file as well, which will not be made available to readers.

Not because there’s profit to be made withholding it, but because it’s garbage data. As was the case with the CSD and CT data, StatsCan’s doubling its acceptable global non-response rate masked how unreliable the 2011 NHS DA-level data was relative to prior long-form Censuses.

An impressive 85 percent of DAs (47,746 / 56,204) met the significantly lowered data quality threshold (50 percent GNR). However, with the pre-2011 NHS standard (25 percent GNR) only 25 percent of DAs (14,110 / 56,204) would have been fit for release.

That’s the same data Environics Analytics would be providing potential clients through its ‘CensusPlus’, albeit with extra tabs using variables from previous short and long form censuses to impute (i.e., guesstimate) up to half the missing data. While that may fill in the gaps, it’s no assurance the data will be any more accurate. And yes, that means it’s just as likely to make the already inaccurate data even more so – remember the GIGO rule.

That may be ‘good enough’ for some profitable businesses in terms of marketing cost-effectiveness. But it would be tough for community organisations, charities and the like to justify digging into their limited financial resources to pay for unreliable data. Yet without reliable data many of those organisations could face losing future funding from provincial social agencies and larger umbrella groups (e.g., Centraide / United Way) for failing to adequately account for their relevance to and impact on the communities they serve.

It’s a lose-lose for just about everyone. Except perhaps a federal government with a low-information, ideologically-driven policy agenda. And agencies looking to profit off Canadians’ collective misfortune.

Leave a Reply

Your email address will not be published. Required fields are marked *