A friend recently shared an interesting perspective on the perils of criticising the 2011 National Household Survey (NHS) before the data is even released. The thinking went something like this:
The current federal government had been signalling its intention to eliminate the long-form Census from the moment it took office in 2006. Unfortunately (for that government; fortunately for Canada), the process was too far along to stop, given it took office February 6 and Census day was May 16, 2006. The same government went on to cancel the 2011 long-form Census, replacing it with the voluntary NHS. Inevitably, the data quality would significantly deteriorate, likely to the point of being completely useless in many areas. This deterioration in data quality would subsequently provide justification for the government to announce the cancellation of the long-form Census/NHS all together. It would be deemed too costly to maintain for the lousy quality data it produced. Not so long ago, this would have seemed a somewhat far-fetched conspiracy theory; it’s a pretty interesting theory given the current state of Canadian federal politics.
Given this theory, any criticism of the NHS data before it even comes out could be used to build a narrative to justify the cancellation of the long-form survey all together, ignoring critics’ true intent. This thinking of course is premised on the idea that the 2011 NHS data would be released much in the same way the 2006 Census 2B data was. Given that both long-form surveys took place the same month (May) of their respective years, the delay in release dates between the two (2006 and 2011) should clearly indicate it’s not business as usual. But more on that shortly.
Municipal amalgamation in larger metropolitan areas across Canada over the last 10-15 years has resulted in the dissolution of geographic boundaries that once defined smaller, well-established communities. While whether / how great an economic efficiency resulted from amalgamation is certainly a topic of debate, the fact is dissolution of those boundaries did not change reality in those communities. Community organisations continued to tend to the specific needs of their residents. In some cases, provincial health an social service organisations were re-organised / expanded in an effort to better meet community needs. Some communities, in collaboration with their respective newly-amalgamated municipalities, caught on to the idea of using more micro-level geographic (dissemination area – DA, census tract – CT) Census data to recreate the dissolved boundaries of their communities. In this way they could continue to discern the specific needs of their residents, much as they had prior to the formal dissolution of their communities’ boundaries.
At least that was the case prior to the long-form Census cancellation. As reported by numerous media outlets in the summer of 2010, the cancellation of the long-form Census and replacement by the NHS would result in less reliable data. What did not garner as much attention back then was the practical impact the less reliable data would have. Statscan faces a rather unenviable task with respect to the 2011 NHS lower-level geographic data dissemination: Release unreliable data and risk the reputation of the agency, or withhold it and risk the ire of community and social organisation across the country. It’s a lose-lose proposition – intentionally so, if you accept the proposed theory. Given the context, it should come as no surprise that the regular release schedule was pushed back.
In anticipation, some community organisations have inquired as to whether they will be able to update their community profiles with the 2011 NHS data. Not surprisingly, the formal response from Statscan:
The National Household Survey product line is currently under development, so we are unable to provide a response at this time regarding availability of data at small levels of geography.
i.e. We’re not sure if data for lower-level geographies will be released at all. Slightly different in wording, but materially different in meaning, from the information provided in the ’Important findings’ section of this Census report (updated July 2012):
It is unknown at this point what the impact of the non-response will be on the quality of the NHS data, particularly for low geographic areas and small populations, and to what extent this quality will meet users’ needs.
The long-form Census had ~95% response rate and 20% sampling. The NHS had an estimated ~50% response rate (Statscan has since claimed 2/3 of households responded, but questions have been raised re the standard of what it deemed an acceptable response under NHS) and 30% sampling. The mandatory nature of the Census ensured that each household receiving a questionnaire had almost the same high likelihood of returning it, and that nearly the entire 20% sample would be captured. The voluntary NHS with its expected response rate would capture a 15% sample of whoever-felt-like-answering. Much like an online poll; imagine policy decisions based on data that unreliable.
There are a couple of adjustments that could be made, like reweighing the distribution of the NHS responses to match the age/gender/language distributions from the short-form for a given geographic area. The better electoral polls do this, and we know how accurate those have been in recent federal and provincial elections. Such adjustments only go so far, since certain subgroups of the population (e.g. immigrants) are not evenly distributed across those three demographic traits, nor are they evenly distributed geographically at lower levels within a community (e.g. ethnic enclaves). What Statscan will invariably end up doing is using the 2006 Census as a base and estimate the 2011 NHS distributions with it as well as other, more up-to-date surveys and administrative data. Which defeats the whole purpose of the exercise, since the 2011 Census was supposed to be the new benchmark.
In any case, the process is likely not going well if Statscan’s not sure if it will release 2011 NHS data at lower level geographies only six months prior to its first scheduled release. The demographics from the short-form (e.g. the sharp rise in young adults staying at / returning home) suggests a significant deterioration of socio-economic well-being since 2006. Unfortunately, a full and accurate measure of the impact will likely be difficult, if not completely impossible, to capture at the community level.