Smart people distrust Statistics Canada privacy: 2016 census report

Longitudinal Labour Force File
Social Data Linkage Environment
.T1-Income Tax Returns and T4-S and T4-F forms
.Child Tax Benefits
.Immigration and Visitors files (1993 or earlier)
.Provincial and municipal welfare files
.National Training Program
.Canadian Job Strategy
.National Employment Services
.Employment Insurance Administrative
.Record of Employment
.Social Insurance Master file
.T1 Personal Master Files
.Canadian Child Tax Benefits files
.Longitudinal Immigration Database
.Indian Registry
.Vital Statistics – birth and death databases
.Sample portion of Census of Population (1991 onward)
.National Household Survey (2011 onward)
.National Longitudinal Survey of Children and Youth
.Longitudinal Survey of Immigrants to Canada
.Survey of Labour and Income Dynamics
.Youth in Transition Survey
.National Population Health Survey
.T1 Family File
.Clinical administrative databases (1992 onward)
.Canadian Cancer Registry
.Canadian Community Health Survey (all cycles)
.Canadian Health Measures Survey (all cycles)
(with qualifier, “files include but are not limited to”)
Source(s): Annual Report to Parliament 1999-2000, The Privacy Commissioner of Canada; Approved record linkages – 2014 submissions, Statistics Canada.


As mentioned recently, Statistics Canada released its 2016 Census Program Content Test report on April 1st of this year, just one month before it began census letter mailings. As already discussed, the 2016 census was the first where Statscan neither asked respondents about their income nor for consent to obtain the information from their Canada Revenue Agency (CRA) tax records. Instead, it proceeded to link Canadians’ census and CRA tax records without their consent.

One would suspect more than a few Canadians who took the time to read the brief, and conspicuously vague, note on their census form announcing the change may have had concerns. Statscan has claimed no such concerns were brought to its attention.  However, a careful reading of the referenced report casts doubt on that claim. And it was smart people who were most concerned with changes to the 2016 census, according to the same Statscan report.

The 2016 census  test report, along with the related Privacy Impact Assessment submitted to the Office of the Privacy Commissioner, conflated two issues, and in so doing effectively obfuscated one of them. Statscan actually tested two different changes with respect to how it collected income information during the 2016 census tests.

Social Insurance Number collection

The first change discussed in the 2016 census test report is social insurance number (SIN) collection. The reason given for testing this change was to make it easier to match census information to administrative data, like CRA tax records, presumably improving response quality.

Two interesting points from the report:

The SIN question had been tested during qualitative interviews in January 2014, and focus groups were also organized at the same time to gather input regarding this question. The results showed mixed opinions…

Highly educated persons or professionals were reluctant to share this information.

The results, how ‘mixed’ they were and the share of ‘highly-educated persons or professionals’ who balked at the question are conspicuously absent from the report.

Nevertheless, it should be apparent why such individuals would not be inclined to respond to the question. They would understand the need to protect their SIN, and the possible violation of privacy that could result from sharing it with unauthorised parties. They probably would have asked questions. After receiving Statscan’s response and doing their own research, they would have likely concluded Statscan wasn’t authorised to collect their SIN.

It’s worth pointing out that such individuals’ concerns likely included concerns over the federal government using their SIN number to create a “citizen profile in all but name”. That was the term used in the  Office of the Privacy Commissioner 1999-2000 annual report  to describe a database created back then by Human Resources Development Canada (HRDC) that linked numerous administrative databases to create detailed profiles of Canadian citizens. The report admonished the creation of said database.

As the attached chart illustrates, the Longitudinal Labour Force File created by HRDC was far less expansive than the Social Data Linkage Environment created by Statscan.

CRA tax file linkage without consent

The second change discussed in the 2016 census test report is the linkage of census and CRA tax records without informed consent.

Starting with the 2006 cycle, the agency introduced the “informed consent” question…. By responding “yes,” respondents authorized the agency to impute their income data from their tax returns. This proposal reduced the response burden while increasing the accuracy of the amounts reported. The informed consent question was also included in the 2011National Household Survey.

Starting in 2016… respondents will no longer need to give their permission to use their tax returns; they will simply be informed about the use of these data in the message from the Chief Statistician… This is referred to as “information communicated to the respondent.”

The report goes on to dismiss any concerns over privacy in a curious manner.

During collection for the Content Test, Statistics Canada did not receive any negative feedback from respondents through the Census Help Line, during non-response follow-up or through any other channels about the fact that Statistics Canada was to obtain income information from personal income tax and benefit records.

Astute readers may notice that when referring to the SIN question, Statscan mentions the feedback it received during qualitative interviews and focus group testing, conducted prior to live questionnaire testing. However, when referring to the removal of informed consent to access respondents’ CRA tax records, Statscan only refers to feedback it received during and after questionnaire collection; in doing so, it could ignore the nearly 1 in 5 households that didn’t bother responding at all, effectively providing ‘negative feedback’ with their refusal.

The only ‘response quality’ test mentioned in the report is whether those who chose to respond to the SIN question actually provided a valid number. Of those chosen to self-respond, collectively (electronic and paper) , 9.7 percent refused to respond to the SIN question, 6.9 percent gave a ‘soft refusal’ by claiming not to have one and 1.2 percent provided an invalid number. Statscan deemed response quality excellent because barely 1 percent gave an invalid response, ignoring the fact that nearly 1 in 5 effectively refused to respond to the SIN question.

It’s not a stretch to assume that individuals who did not wish to provide their SIN to Statscan likely didn’t wish to do so because of the potential for (mis)use of the unique identifier to link their census responses to other administrative data. For the agency to assume these same individuals would not take issue with it proceeding to make such linkages anyway without their informed consent is ethically questionable at best.

The reason Statscan didn’t bother explicitly asking during the 2016 census test whether individuals had any issues with the agency linking to their administrative files – including, but obviously not limited to, their CRA tax records – is because it likely knew how they would respond.

As mentioned, nearly 1 in 5 balked at providing their SIN during the 2016 census test, so it’s not a stretch to assume the same proportion would balk at Statscan doing an end-run around it. The same proportion, 1 in 5, also refused to provide Statscan consent to link to (what they thought at the time would only be) their CRA tax records during the 2006 census. More than 1 in 4 refused to provide consent during the 2011 National Household Survey (NHS).

Convenience and accuracy

Those who took the time to review the notice from the Chief Statistician on their 2016 census form would have read:

To reduce the burden on Canadians and to improve the quality of the data, Statistics Canada will not ask questions on income but rather use information already available from the Canada Revenue Agency.

Did SIN collection improved census and CRA record linkage? No, not according to the 2016 census test report.

The results showed the SIN did not improve the record linkage.

Did doing away with informed consent improve the accuracy of the income data retrieved? Curiously, income data quality wasn’t even tested.

Income data were not part of the 2014 Content Test… Income data collected during the test were not analyzed. However, Statistics Canada continues to assess the impact that the changes planned for 2016 could have on the quality of the estimates produced and released.

That said, there’s a 2010 Statscan paper that compared 2005 income data from both the 2006 census, which included income questions to fill out as well as a request for informed consent, and the T1 Family File, which was derived from CRA tax records. Adjusting for income concepts, the census in fact slightly over-reported total income by 1.9%, contradicting assertions that Canadians under-report their income on the census. The paper also found T1FF data demonstrated a coverage problem for young people age 15 to 19.

Although the paper didn’t explicitly discuss it, many young people age 15 to 19 who work either don’t receive a T4 or don’t file a T1 tax return, so having income questions to fill out is actually quite useful. That small detail would also partly account for the over-reporting of income in the census compared to the T1FF.

So it would appear Statscan’s justifications for SIN collection and eschewing informed consent stand on pretty shaky ground if the objective was to “improve the quality of the data”.

Eschewing respondent consent to access personal records supposedly “to reduce the burden on Canadians” is ridiculous on the face of it. The consent question would have added one extra tick box to what ended up being either a 10 or 36 page survey, depending on the census form a household received in 2016.

The real reason for bypassing respondent consent, if it wasn’t already obvious from the table above, is cryptically spelled out in the same notice from the Chief Statistician:

The information that you provide may be used by Statistics Canada for other statistical and research purposes or may be combined with other survey or administrative data sources.

Pyrrhic victory

The current Privacy Commissioner has so far failed to address a privacy issue his predecessors took rather seriously, effectively justifying concerns over his 2014 appointment.

Nevertheless, that should be little consolation to Statscan. While the agency makes a point of offering reassurances about data security and confidentiality to counteract its perceived encroachment on Canadians’privacy, research has found that “confidentiality assurances are not always perceived as reassuring, and do not necessarily increase the public’s willingness to respond.”

There’s a trade-off between privacy and co-operation. And co-operation is necessary to ensure response quality, especially on the census. While answers to certain questions like age, gender, employment and income can be easily (although not necessarily more accurately) derived from administrative records, that’s not the case for some of the questions asked as part of the 2016 census, and most of the questions asked on the 2016 NHS.

But what happens when co-operation isn’t an option, as in the case of a mandatory census? It’s impossible to tell what might happen to response quality.

Some of the top comments submitted to a recent CBC news item suggest what may happen. One such comment read: “If I am forced to provide my personal information to the government, I will do my best to provide the most misleading and inaccurate information that I possibly can.” Similarly, a census protest page advises 2016 census refuseniks to provide “minimum co-operation” with the census.

If non-response presents a known unknown, coerced response presents an arguably greater problem – the unknown unknown.

Besides compromising census response quality, the ill-will that stems from what many Canadians perceive as an encroachment on their privacy carries forward; it likely explains generally declining  response rates on Statscan’s voluntary surveys.

For example, the response rate to the 2014 Survey of Household Spending (SHS) diary – used to calculate the Consumer Price Index, one of the three key economic indicators produced by the agency – was only 44 percent.

The SHS itself has an interesting history that Statscan apparently hasn’t learned from. The voluntary SHS replaced the Survey of Family Expenditure in 1997 when over-reach by the agency in an effort to restore mandatory status to FAMEX resulted in numerous complaints being filed, according to the  Office of the Privacy Commissioner 1997-1998 annual report.

Statscan would be well served learning from its past mistakes. It should reconsider its decision to eschew informed consent. The evident disadvantage in discounting Canadians’ privacy concerns outweigh the limited, potential advantages of doing so. How the agency chooses to proceed will impact all of the household survey data that Canadian policymakers rely on, not just the census.


