IRCC Datasets: What they say about government priorities

While preparing a presentation on how immigration, settlement, citizenship and multiculturalism worked together to facilitate integration, I accessed a broad range of the IRCC operational datasets on the government’s Open Data website. Intrigued by what was available and what was not, I reviewed all  227 unique datasets.

IRCC has, to its credit, invested considerable resources in these datasets for both internal and external use, having the fifth largest number of datasets on Open Data (excluding Statistics Canada). Moreover, these datasets are among the most widely used: 11 of the top 25 government datasets downloaded are from IRCC (April 2017).

IRCC demonstrated considerable flexibility and agility in the creation of datasets with respect to the recent wave of Syrian refugees, and the introduction of monthly operational statistics for key programs.

Part of my motivation was to assess the long-standing weaknesses in citizenship datasets, reflecting the relative lower priority of the citizenship program, and make recommendations for improvements.

Not surprisingly, the datasets reflect IRCC’s overall management emphasis on immigration as well as stakeholder demand: permanent and temporary resident datasets are 93.5 percent of the total. The datasets include:

  • Permanent residents (immigrants: economic, family and refugee classes): 110 datasets of 47.6 percent of the total.
  • Temporary residents (Temporary Foreign Worker Program: includes agricultural workers, live-in caregivers and others; International Mobility Program: includes those admitted under international services agreements like NAFTA, those under “Canadian Interests,” primarily under youth work exchange program and spousal employment; international students): 106 datasets or 45.9 percent.
  • Citizenship and passport: Six datasets or 2.6 percent.
  • Settlement services: Nine datasets or 3.9 percent.

IRCC datasets can be divided into four categories: ongoing and published on a regular basis (80.5 percent), archived or historical datasets (16.5 percent) and specialized datasets pertaining to international students (2.2 percent).

The majority are updated annually (54.1 percent), followed by the recent introduction of monthly reports (21.6 percent), quarterly (9.1 percent) and other (15.2 percent). Monthly and quarterly reports focus on operational data: the number of applications, approvals, approval rate and inventory.

Permanent and Temporary Residents

The comprehensive datasets for permanent and temporary residents include information regarding program and category, country of origin (whether processing source area, country of citizenship or country of birth), gender and age. Table 1 summarizes this information with most datasets having several variables (e.g., gender and age).Given the shared federal-provincial jurisdiction for immigration, and the increased and active role of the provinces in selection (i.e., the Provincial Nominee Program), it is no surprise that the majority of permanent resident datasets are broken down by province (52.7 percent), with 31.8 percent at the national level. To assist the planning and programmes of municipalities and service provider organizations for settlement services (integration), ten percent are at the Census Metropolitan Area (CMA) level, with a further 4.5 percent at the Census District (CD) level.

In terms of immigration class, over one-quarter are for refugees (27.3 percent), 21.8 percent for economic class, and 1.8 percent for family class, with 49.1 pertaining to all classes.

For temporary residents, who cannot access settlement services, the majority (60.4 percent) are at the national level, 34 percent at the provincial level, and 4.7 percent at the CMA level.

By program, datasets for international students form 23.6 percent, IMP and TFWP each at 21.8 percent and other ten percent, with 22.7 percent pertaining to all programs.

By and large, these datasets are coherent and consistent, with any variation reflecting program needs and a balance between the overall picture and greater detail (e.g., top 10 for refugees, top 20 for IMP or top 50 for students or all (various) countries of birth or citizenship).

However, as shown in Table 2, the main difference concerns age data, with permanent residents focused more on younger immigrants, compared to temporary residents with a relatively greater focus on older workers. The difference in age cohorts between all permanent residents and those admitted under Express Entry likely has a policy justification. However, it is hard to understand the policy rationale for settlement services using the temporary resident breakdown given that only permanent residents can access these services. IRCC may wish to review whether there is a need for greater consistency and coherence regarding the age cohorts.

Citizenship, Passport and Settlement Services

There are only six datasets for citizenship (including one for passport) and nine for settlement services (three general, six for refugees). This reflects a number of reasons:

  • Citizenship has always been a secondary priority for IRCC at both the political and official levels. The program is under-funded and under-managed, as seen in the large and repeated fluctuations in the number of applications and new citizens, in sharp contrast to the number of new permanent residents which is more tightly managed to deliver on the annual levels plan (Chart 1);
  • The provinces have no role in citizenship and thus no data demands. Immigration stakeholders have limited interest in citizenship as they focus on immigration and refugee issues;
  • Passport is a new program to IRCC (previously was with Global Affairs Canada), with similarly low interest with outside stakeholders beyond basic operational data; and,
  • Service provider organizations (SPOs) and others that are interested is settlement services data have a wide range of useful permanent resident data that assists them in planning and operations. IRCC has responded to the needs of SPOs by providing general refugee settlement datasets as well as specific ones for Syrian refugees.

Table 3 lists these datasets:

Moreover, these are more limited than other datasets. Annual permanent and temporary resident provide ten year data, adequate to assess trends and changes. In contrast, annual citizenship data covers only five years, settlement services data only two years and passport processing data is not even presented on a full-year basis, making it impossible to assess trends and the impact of policy and program changes. Citizenship datasets are even more limited with no gender and age breakdowns.

They are also updated less frequently than other datasets. Monthly datasets for permanent and temporary residents and settlement services include April 2017 at the time of writing (14 June); citizenship only until February 2017, and passport until December 2016.

Concluding observations

As noted, IRCC has invested considerable resources in developing and maintaining these datasets. In doing so, it has naturally enough reinforced its main focus on immigration statistics, responding to overall stakeholder interests, with minimal attention to citizenship.

The datasets appear to have grown organically as program changes created needs for new datasets. There appears to be potential to review the number and type to see if some datasets are no longer needed or duplicative (e.g., the introduction of monthly datasets may make quarterly ones necessary, are csv versions needed in addition to xls?).

Another area for improvement with respect to provincial datasets is to ensure that these all include national totals by program, as there is currently some inconsistency (e.g., Transition from Temporary Resident to Permanent Resident Status – Quarterly IRCC Updates tables versus the “Facts and Figures” series for both permanent and temporary residents).

Other areas for improvement at the Open Data level include, particularly those that are likely within IRCC control:

  • Order the dataset groupings alphabetically as it currently appears random, with related sets not grouped together;
  • Review grouping titles for clarity, particularly “Quarterly Updates” as the vast majority of datasets listed are a mix of annual and quarterly data;
  • Review all data set titles for consistency (e.g., temporary resident facts and figures are numbered, permanent residents are not; set a standard sequence: program/category then geography, then specific variables such as gender, age, education etc.; inclusion or not of ‘Canada’ in title; indicate specific immigration class if appropriate);
  • Advocate with other departments for a wider field for data set descriptions (appears to be only 37 characters) to make these more readable and shorten the wasted space of the titles for other fields (type, format, language, links);
  • Advocate for more than 10 dataset groupings per web page to minimize clicks.

My particular focus, however, is with respect to citizenship.

The lack of attention to citizenship, seen operationally in the wide swings of application and new citizens, requires greater management focus and attention. While IRCC has been very helpful in the provision of special runs, more comprehensive citizenship datasets on Open Data are needed. IRCC should ensure a minimum degree of consistency with permanent and temporary resident datasets that would help flag operational and policy concerns. For citizenship, passport, and settlement services, these would include:

  • 10-year time series data for citizenship and settlement services;
  • 1947-2016 long-term citizenship data (new citizens);
  • gender breakdown for citizenship (not just for adoptions), passport and settlement services (not just for refugees);
  • age breakdown for citizenship and passport, using the permanent resident age groups; and,
  • monthly citizenship applications by country of birth, not just monthly number of new citizens.

Should resources permit, a number of additional citizenship datasets should be considered to provide a more comprehensive understanding of how well the program is working with respect to integration and reinforcing the immigrant-to-citizen transition:

  • Annual data on the number and percentage of immigrants who have taken up citizenship within six years of landing in order to assess the recent naturalization rate, not the overall one that IRCC cites in its performance reports and elsewhere. While a target of 70 percent naturalization within six years of landing is proposed, more analysis might suggest a different target. Having this data collected and reported would inform the establishment of a meaningful performance standard; and,
  • Annual breakdown by immigration class of new citizens and approval rates by gender to assess the impact on each class of citizenship policies.

Given the importance of immigration, settlement, citizenship and multiculturalism to integration of newcomers and their children, good and comprehensive data is central to evidence-based policy making. IRCC has again commendably invested in such data with respect to immigration data but should address the above mentioned gaps in citizenship data to strengthen the management and oversight of the citizenship program.

Rudy and McKinney: Making government information more accessible

Valid points and practical suggestions made by Bernard Rudny of Powered by Data, a project of Tides Canada, and James McKinney.

My experience is mixed with respect to data requests.

Some departments (CIC/IRCC) have established procedures and protocols to access data, and have been very forthcoming in my requests (apart from the Comms folks who refused to provide polling data in spreadsheet form!).

TBS was similarly forthcoming with respect to diversity among ADMs but PCO was not able (or unwilling) to provide the public information on the more than 1,300 GiC appointments in spreadsheet form (like any database, this should be easily exportable):

ATI is simultaneously an invaluable and cumbersome system. Any record that is requested must be manually reviewed, regardless of how innocuous it may be, which makes the process slow and inefficient.

Consider a common example: you request a spreadsheet from a federal department. Its contents are neither confidential nor controversial. Under the present system, that spreadsheet will be printed, reviewed, scanned, then mailed to you as a PDF file on a CD-ROM. The whole process takes weeks, months or even years. By the time it’s complete, any functionality the spreadsheet had as a digital document — like being able to search for text, or add up the numbers in a column — is gone. Instead, you’re dealing with a low-grade image of something that was once useful data that could be searched and sorted.

This is a 20th-century approach to information. It treats every “record” like a paper document. That’s appropriate in some cases, but in the era of the Internet and databases, it’s out of step with the times. The alternative is to release information pro-actively — not just in response to requests — and to use formats that preserve the value of digital data. True openness is about eliminating barriers to access and going out of your way to publish open data.

To be fair, there has been some good news on that front: the Treasury Board Secretariat has done a laudable job of creating an open data program and the 2014 Directive on Open Government included a commitment to being open by default.

If the federal government is going to become more open, it needs to be transparent about the progress it is making

So where do we go from here? How can the government build trust and make progress on this issue? The first step is to inventory all the information of value the federal government holds. Canada has already committed to creating that inventory under the open government directive, but it’s not required to happen before 2020. Speeding up the process is essential.

As we write this, more than 245,000 datasets are available through the government’s open data portal. That’s an impressive number, but it raises a question: how many are still closed? The number is likely in the millions, but ultimately unknown. Without it, there’s no meaningful way to measure the progress being made.

Moreover, some federal departments have been better about releasing information than others. Of the open datasets mentioned above, about 236,000 come from Natural Resources Canada. Aboriginal Affairs and Northern Development, meanwhile, has released two datasets, Public Safety has released one. Inventories would help assess which departments need more help to open up their information.

Completing this inventory of information sooner rather than later would provide other benefits. Once the inventory is available, stakeholders — including researchers, communities, non-profit organizations and businesses — can provide informed input on what data to release first. That allows government departments to prioritize the opening of information that will enable positive social and economic impacts.

No one expects “open by default” to be implemented overnight. There are many steps to take — from reforming the ATI system to dealing with Crown copyright — and the road is long. Some information will also need to be kept within government for reasons of privacy and security.

Source: Rudny & McKinney: Making government information more accessible | National Post

Tony Clement concern about electronic information access queried – Politics – CBC News

Further to earlier news reports, further confirmation of a Minister not having thought things through, not to mention mixed messaging on the Open Data initiative:

Treasury Board President Tony Clement’s dire warning about why the government can’t release certain electronic data under access to information requests seems to have left his senior staff mystified, newly disclosed documents show.

In an interview late last year, Clement said that some database requests under the Access to Information Act can’t be released in their original electronic format because the numbers could be manipulated and “create havoc.”

At the time, Clement was responding to complaints that requests for electronic data often produced records in paper form that couldn’t  be scrutinized by a computer for patterns.

“That’s the balancing act that we have to have, that certain files, you don’t want the ability to create havoc by making it changeable online,” he told The Canadian Press in an interview.

But emails from Clement’s senior staff show the statement left them puzzled about why their minister would make the claim.

“It’s a headscratcher for me. Any idea what the minister is referring to?” wrote one staffer after checking the morning headlines on Dec. 23.

“It’s a speculative thing, no actual occurrence to date … I can’t think of what has not been released due to this perspective,” wrote another — Patrick McDermott, senior manager for open government systems at the Treasury Board secretariat. “What prompts this comment now is a mystery to me.”

For several years, Clement has been touting the Harper government’s proactive online posting of federal databases for free downloading, partly to encourage businesses to mine the data for profit. Canadian corporations trail their counterparts around the world in capitalizing on so-called “big data.”

‘I’m a bit surprised that the [minister] would raise this’

– email from Treasury Board official

The Open Data Portal now offers more than 240,000 free datasets, the vast majority from Natural Resources Canada, apparently without any concern that someone might use them to spread “falsehoods.”

At the same time as pushing this data, though, federal departments have come under fire for failing to deliver individual, non-published datasets requested under the Access to Information Act in their original format, often recreating them in censored paper versions.

Requesters asking for datasets under the Access to Information Act are sometimes given paper versions instead, making it impossible to use computers to sort data.

Departments have offered different explanations for delivering in paper format, but Clement’s comment was the first time a government official claimed the paper copies were designed to foil any statistical mischief.

“I’m a bit surprised that the [minister] would raise this — everyone in the OG (Open Government) community … is aware of the risk that data/info may be misused/applied/quoted etc. .. but that’s just the nature of the beast,” McDermott wrote.

“The trick is to rebut the ‘falsification,’ not speculatively prevent it from happening in the first place.”

In substance, completely silly and just making it hard for those of us who need and use government data on a regular or occasional basis.

Tony Clement concern about electronic information access queried – Politics – CBC News.

Audit slams feds’ ‘Open Data’ performance

Unfortunate, as paper (and pdfs) make an unnecessary complication to analyze data.

CIC publishes many operational stats in electronic format, making it easy to analyse. More formal ATIP requests are either paper or pdfs, inserting a tedious step of conversion.

Have a few new ATIP requests with the provinces (for data) and will see what comes back (have requested electronic format):

Newspapers Canada directly tested federal, provincial and municipal transparency laws with almost 400 formal requests for information last October and November, the 10th annual audit carried out by the organization.

This years version added 172 requests for electronic data sets, requiring the information to be provided in a format that can be digested and manipulated by computer.

Most government bodies fell short, many insisting on providing the data requested on paper, or providing it in the electronic equivalent of a photo — impossible to process in a spreadsheet or database program.

Among the worst performers were some departments of the federal government, which has been promoting its Open Data agenda as evidence of transparency, including the proactive posting of some 200,000 data sets online.

The audit found that Prime Minister Stephen Harper’s own department, the Privy Council Office, refused to release any information in electronic format, insisting on paper printouts.

Audit slams feds’ ‘Open Data’ performance | National Newswatch.