From facial recognition, to predictive technologies, big data policing is rife with technical, ethical and political landmines

Good long read and overview of the major issues:

In mid-2019, an investigative journalism/tech non-profit called MuckRock and Open the Government (OTG), a non-partisan advocacy group, began submitting freedom of information requests to law enforcement agencies across the United States. The goal: to smoke out details about the use of an app rumoured to offer unprecedented facial recognition capabilities to anyone with a smartphone.

Co-founded by Michael Morisy, a former Boston Globe editor, MuckRock specializes in FOIs and its site has grown into a publicly accessible repository of government documents obtained under access to information laws.

As responses trickled in, it became clear that the MuckRock/OTG team had made a discovery about a tech company called Clearview AI. Based on documents obtained from Atlanta, OTG researcher Freddy Martinez began filing more requests, and discovered that as many as 200 police departments across the U.S. were using Clearview’s app, which compares images taken by smartphone cameras to a sprawling database of 3 billion open-source photographs of faces linked to various forms of personal information (e.g., Facebook profiles). It was, in effect, a point-click-and-identify system that radically transformed the work of police officers.

The documents soon found their way to a New York Times reporter named Kashmir Hill, who, in January 2020, published a deeply investigated feature about Clearview, a tiny and secretive start-up with backing from Peter Thiel, the Silicon Valley billionaire behind Paypal and Palantir Technologies. Among the story’s revelations, Hill disclosed that tech giants like Google and Apple were well aware that such an app could be developed using artificial intelligence algorithms feeding off the vast storehouse of facial images uploaded to social media platforms and other publicly accessible databases. But they had opted against designing such a disruptive and easily disseminated surveillance tool.

The Times story set off what could best be described as an international chain reaction, with widespread media coverage about the use of Clearview’s app, followed by a wave of announcements from various governments and police agencies about how Clearview’s app would be banned. The reaction played out against a backdrop of news reports about China’s nearly ubiquitous facial recognition-based surveillance networks.

Canada was not exempt. To Surveil and Predict, a detailed examination of “algorithmic policing” published this past fall by the University of Toronto’s Citizen Lab, noted that officers with law enforcement agencies in Calgary, Edmonton and across Greater Toronto had tested Clearview’s app, sometimes without the knowledge of their superiors. Investigative reporting by the Toronto Star and Buzzfeed News found numerous examples of municipal law enforcement agencies, including the Toronto Police Service, using the app in crime investigations. The RCMP denied using Clearview even after it had entered into a contract with the company — a detail exposed by Vancouver’s The Tyee.

With federal and provincial privacy commissioners ordering investigations, Clearview and the RCMP subsequently severed ties, although Citizen Lab noted that many other tech companies still sell facial recognition systems in Canada. “I think it is very questionable whether [Clearview] would conform with Canadian law,” Michael McEvoy, British Columbia’s privacy commissioner, told the Star in February.

There was fallout elsewhere. Four U.S. cities banned police use of facial recognition outright, the Citizen Lab report noted. The European Union in February proposed a ban on facial recognition in public spaces but later hedged. A U.K. court in April ruled that police facial recognition systems were “unlawful,” marking a significant reversal in surveillance-minded Britain. And the European Data Protection Board, an EU agency, informed Commission members in June that Clearview’s technology violates Pan-European law enforcement policies. As Rutgers University law professor and smart city scholar Ellen Goodman notes “There’s been a huge blowback” against the use of data-intensive policing technologies.

There’s nothing new about surveillance or police investigative practices that draw on highly diverse forms of electronic information, from wire taps to bank records and images captured by private security cameras. Yet during the past decade or so, dramatic advances in big data analytics, biometrics and AI, stoked by venture capital and law enforcement agencies eager to invest in new technology, have given rise to a fast-growing data policing industry. As the Clearview story showed, regulation and democratic oversight have lagged far behind the technology.

U.S. startups like PredPol and HunchLab, now owned by ShotSpotter, have designed so-called “predictive policing” algorithms that use law enforcement records and other geographical data (e.g. locations of schools) to make statistical guesses about the times and locations of future property crimes. Palantir’s law-enforcement service aggregates and then mines huge data sets consisting of emails, court documents, evidence repositories, gang member databases, automated licence plate readers, social media, etc., to find correlations or patterns that police can use to investigate suspects.

Yet as the Clearview fallout indicated, big data policing is rife with technical, ethical and political landmines, according to Andrew Ferguson, a University of the District Columbia law professor. As he explains in his 2017 book, The Rise of Big Data Policing, analysts have identified an impressive list: biased, incomplete or inaccurate data, opaque technology, erroneous predictions, lack of governance, public suspicions about surveillance and over-policing, conflicts over access to proprietary algorithms, unauthorized use of data and the muddied incentives of private firms selling law enforcement software.

At least one major study found that some police officers were highly skeptical of predictive policing algorithms. Other critics point out that by deploying smart city sensors or other data-enabled systems, like transit smart cards, local governments may be inadvertently providing the police with new intelligence sources. Metrolinx, for example, has released Presto card user information to police while London’s Metropolitan Police has made thousands of requests for Oyster card data to track criminals, according to The Guardian. “Any time you have a microphone, camera or a live-feed, these [become] surveillance devices with the simple addition of a court order,” says New York civil rights lawyer Albert Cahn, executive director of the Surveillance Technology Oversight Project (STOP).

The authors of the Citizen Lab study, lawyers Kate Robertson, Cynthia Khoo and Yolanda Song, argue that Canadian governments need to impose a moratorium on the deployment of algorithmic policing technology until the public policy and legal frameworks can catch up.

Data policing was born in New York City in the early 1990s when then-police Commissioner William Bratton launched “Compstat,” a computer system that compiled up-to-date crime information then visualized the findings in heat maps. These allowed unit commanders to deploy officers to neighbourhoods most likely to be experiencing crime problems.

Originally conceived as a management tool that would push a demoralized police force to make better use of limited resources, Compstat is credited by some as contributing to the marked reduction in crime rates in the Big Apple, although many other big cities experienced similar drops through the 1990s and early 2000s.

The 9/11 terrorist attacks sparked enormous investments in security technology. The past two decades have seen the emergence of a multi-billion-dollar industry dedicated to civilian security technology, everything from large-scale deployments of CCTVs and cybersecurity to the development of highly sensitive biometric devices — fingerprint readers, iris scanners, etc. — designed to bulk up the security around factories, infrastructure and government buildings.

Predictive policing and facial recognition technologies evolved on parallel tracks, both relying on increasingly sophisticated analytics techniques, artificial intelligence algorithms and ever deeper pools of digital data.

The core idea is that the algorithms — essentially formulas, such as decision-trees, that generate predictions — are “trained” on large tranches of data so they become increasingly accurate, for example at anticipating the likely locations of future property crimes or matching a face captured in a digital image from a CCTV to one in a large database of headshots. Some algorithms are designed to use a set of rules with variables (akin to following a recipe). Others, known as machine learning, are programmed to learn on their own (trial and error).

The risk lies in the quality of the data used to train the algorithms — what was dubbed the “garbage-in-garbage-out” problem in a study by the Georgetown Law Center on Privacy and Technology. If there are hidden biases in the training data — e.g., it contains mostly Caucasian faces — the algorithm may misread Asian or Black faces and generate “false positives,” a well-documented shortcoming if the application involves a identifying a suspect in a crime.

Similarly, if a poor or racialized area is subject to over-policing, there will likely be more crime reports, meaning the data from that neighbourhood is likely to reveal higher-than-average rates of certain types of criminal activity, a data point that would justify more over-policing and racial profiling. Some crimes are under-reported, and don’t influence these algorithms.

Other predictive and AI-based law enforcement technologies, such as “social network analysis” — an individual’s web of personal relationships, gleaned, for example, from social media platforms or examined by cross-referencing of lists of gang members — promised to generate predictions that individuals known to police were at risk of becoming embroiled in violent crimes.

This type of sleuthing seemed to hold out some promise. In one study, criminologists at Cardiff University found that “disorder-related” posts on Twitter reflected crime incidents in metropolitan London — a finding that suggests how big data can help map and anticipate criminal activity. In practise, however, such surveillance tactics can prove explosive. This happened in 2016, when U.S. civil liberties groups revealed documents showing that Geofeedia, a location-based data company, had contracts with numerous police departments to provide analytics based on social media posts to Twitter, Facebook, Instagram, etc. Among the individuals targeted by the company’s data: protestors and activists. Chastened, the social media firms rapidly blocked Geofeedia’s access.

In 2013, the Chicago Police Department began experimenting with predictive models that assigned risk scores for individuals based on criminal records or their connections to people involved in violent crime. By 2019, the CPD had assigned risk scores to almost 400,000 people, and claimed to be using the information to surveil and target “at-risk” individuals (including potential victims) or connect them to social services, according to a January 2020 report by Chicago’s inspector general.

These tools can draw incorrect or biased inferences in the same way that overreliance on police checks in racialized neighbourhoods results in what could be described as guilt by address. The Citizen Lab study noted that the Ontario Human Rights Commission identified social network analysis as a potential cause of racial profiling. In the case of the CPD’s predictive risk model, the system was discontinued in 2020 after media reports and internal investigations showed that people were added to the list based solely on arrest records, meaning they might not even have been charged, much less convicted of a crime.

Early applications of facial recognition software included passport security systems or searches of mug shot databases. But in 2011, the Insurance Corporation of B.C. offered Vancouver police the use of facial recognition software to match photos of Stanley Cup rioters with driver’s licence images — a move that prompted a stern warning from the province’s privacy commissioner. In 2019, the Washington Post revealed that FBI and Immigration and Customs Enforcement (ICE) investigators regarded state databases of digitized driver’s licences as a “gold mine for facial recognition photos” which had been scanned without consent.

In 2013, Canada’s federal privacy commissioner released a report on police use of facial recognition that anticipated the issues raised by Clearview app earlier in 2020. “[S]trict controls and increased transparency are needed to ensure that the use of facial recognition conforms with our privacy laws and our common sense of what is socially acceptable.” (Canada’s data privacy laws are only now being considered for an update.)

The technology, meanwhile, continues to gallop ahead. New York civil rights lawyer Albert Cahn points to the emergence of “gait recognition” systems, which use visual analysis to identify individuals by their walk; these systems are reportedly in use in China. “You’re trying to teach machines how to identify people who walk with the same gait,” he says. “Of course, a lot of this is completely untested.”

The predictive policing story evolved somewhat differently. The methodology grew out of analysis commissioned by the Los Angeles Police Department in the early 2010s. Two data scientists, Jeff Brantingham and George Mohler, used mathematical modelling to forecast copycat crimes based on data about the location and frequency of previous burglaries in three L.A. neighbourhoods. They published their results and soon set up PredPol to commercialize the technology. Media attention soon followed, as news stories played up the seemingly miraculous power of a Minority Report-like system that could do a decent job anticipating incidents of property crime.

Operationally, police forces used PredPol’s system by dividing up precincts in 150-square-metre “cells” that police officers were instructed to patrol more intensively during periods when PredPol’s algorithm forecast criminal activity. In the post-2009 credit crisis period, the technology seemed to promise that cash-strapped American municipalities would get more bang for their policing buck.

Other firms, from startups to multinationals like IBM, entered the market with innovations, for example, incorporating other types of data, such as socio-economic data or geographical features, from parks and picnic tables to schools and bars, that may be correlated to elevated incidents of certain types of crime. The reported crime data is routinely updated so the algorithm remains current.

Police departments across the U.S. and Europe have invested in various predictive policing tools, as have several in Canada, including Vancouver, Edmonton and Saskatoon. Whether they have made a difference is an open question. As with several other studies, a 2017 review by analysts with the Institute for International Research on Criminal Policy, at Ghent University in Belgium, found inconclusive results: some places showed improved results compared to more conventional policing, while in other cities, the use of predictive algorithms led to reduced policing costs, but little measurable difference in outcomes.

Revealingly, the city where predictive policing really took hold, Los Angeles, has rolled back police use on these techniques. Last spring, the LAPD tore up its contract with PredPol in the wake of mounting community and legal pressure from the Stop LAPD Spying Coalition, which found that individuals who posed no real threat, mostly Black or Latino, were ending up on police watch lists because of flaws in the way the system assigned risk scores.

“Algorithms have no place in policing,” Coalition founder Hamid Khan said in an interview this summer with MIT Technology Review. “I think it’s crucial that we understand that there are lives at stake. This language of location-based policing is by itself a proxy for racism. They’re not there to police potholes and trees. They are there to police people in the location. So location gets criminalized, people get criminalized, and it’s only a few seconds away before the gun comes out and somebody gets shot and killed.” (Similar advocacy campaigns, including proposed legislation governing surveillance technology and gang databases, have been proposed for New York City.)

There has been one other interesting consequence: police resistance. B.C.-born sociologist Sarah Brayne, an assistant professor at the University of Texas (Austin), spent two-and-a-half years embedded with the LAPD, exploring the reaction of law enforcement officials to algorithmic policing techniques by conducting ride-alongs as well as interviews with dozens of veteran cops and data analysts. In results published last year, Brayne and collaborator Angèle Christin observed “strong processes of resistance fuelled by fear of professional devaluation and threats of performance tracking.”

Before shifts, officers were told which grids to drive through, when and how frequently, and the locations of their vehicles were tracked by an on-board GPS devices to ensure compliance. But Brayne found that some would turn off the tracking device, which they regarded with suspicion. Others just didn’t buy what the technology was selling. “Patrol officers frequently asserted that they did not need an algorithm to tell them where crime occurs,” she noted.

In an interview, Brayne said that police departments increasingly see predictive technology as part of the tool kit, despite questions about effectiveness or other concerns, like racial profiling. “Once a particular technology is created,” she observed,” there’s a tendency to use it.” But Brayne added one other prediction, which has to do with the future of algorithmic policing in the post-George Floyd era — “an intersection,” as she says, “between squeezed budgets and this movement around defunding the police.”

The widening use of big data policing and digital surveillance poses, according to Citizen Lab’s analysis as well as critiques from U.S. and U.K. legal scholars, a range of civil rights questions, from privacy and freedom from discrimination to due process. Yet governments have been slow to acknowledge these consequences. Big Brother Watch, a British civil liberties group, notes that in the U.K., the national government’s stance has been that police decisions about the deployment of facial recognition systems are “operational.”

At the core of the debate is a basic public policy principle: transparency. Do individuals have the tools to understand and debate the workings of a suite of technologies that can have tremendous influence over their lives and freedoms? It’s what Andrew Ferguson and others refer to as the “black box” problem. The algorithms, designed by software engineers, rely on certain assumptions, methodologies and variables, none of which are visible, much less legible to anyone without advanced technical know-how. Many, moreover, are proprietary because they are sold to local governments by private companies. The upshot is that these kinds of algorithms have not been regulated by governments despite their use by public agencies.

New York City Council moved to tackle this question in May 2018 by establishing an “automated decision systems” task force to examine how municipal agencies and departments use AI and machine learning algorithms. The task force was to devise procedures for identifying hidden biases and to disclose how the algorithms generate choices so the public can assess their impact. The group included officials from the administration of Mayor Bill de Blasio, tech experts and civil liberties advocates. It held public meetings throughout 2019 and released a report that November. NYC was, by most accounts, the first city to have tackled this question, and the initiative was, initially, well received.

Going in, Cahn, the New York City civil rights lawyer, saw the task force as “a unique opportunity to examine how AI was operating in city government.” But he describes the outcome as “disheartening.” “There was an unwillingness to challenge the NYPD on its use of (automated decision systems).” Some other participants agreed, describing the effort as a waste.

If institutional obstacles thwarted an effort in a government the size of the City of New York, what does better and more effective oversight look like? A couple of answers have emerged.

In his book on big data policing, Andrew Ferguson writes that local governments should start at first principles, and urges police forces and civilian oversight bodies to address five fundamental questions, ideally in a public forum:

  • Can you identify the risks that your big data technology is trying to address?
  • Can you defend the inputs into the system (accuracy of data, soundness of methodology)?
  • Can you defend the outputs of the system (how they will impact policing practice and community relationships)?
  • Can you test the technology (offering accountability and some measure of transparency)?
  • Is police use of the technology respectful of the autonomy of the people it will impact?

These “foundational” questions, he writes, “must be satisfactorily answered before green-lighting any purchase or adopting a big data policing strategy.”

In addition to calling for a moratorium and a judicial inquiry into the uses of predictive policing and facial recognition systems, the authors of the Citizen Lab report made several other recommendations, including: the need for full transparency; provincial policies governing the procurement of such systems; limits on the use of ADS in public spaces; and the establishment of oversight bodies that include members of historically marginalized or victimized groups.

Interestingly, the federal government has made advances in this arena, which University of Ottawa law professor and privacy expert Teresa Scassa describes as “really interesting.”

The Treasury Board Secretariat in 2019 issued the “Directive on Automated Decision-Making,” which came into effect in April 2020, requires federal departments and agencies, except those involved in national security, to conduct “algorithmic impact assessments” (AIA) to evaluate unintended bias before procuring or approving the use of technologies that rely on AI or machine learning. The policy requires the government to publish AIAs, release software codes developed internally and continually monitor the performance of these systems. In the case of proprietary algorithms developed by private suppliers, federal officials have extensive rights to access and test the software.

In a forthcoming paper, Scassa points out that the directive includes due process rules and looks for evidence of whether systemic bias has become embedded in these technologies, which can happen if the algorithms are trained on skewed data. She also observes that not all algorithm-driven systems generate life-altering decisions, e.g., chatbots that are now commonly used in online application processes. But where they are deployed in “high impact” contexts such as policing, e.g., with algorithms that aim to identify individuals caught on surveillance videos, the policy requires “a human in the loop.”

The directive, says Scassa, “is getting interest elsewhere,” including the U.S. Ellen Goodman, at Rutgers, is hopeful this approach will gain traction with the Biden administration. In Canada, where provincial governments oversee law enforcement, Ottawa’s low-key but seemingly thorough regulation points to a way for citizens to shine a flashlight into the black box that is big data policing.

Source: From facial recognition, to predictive technologies, big data policing is rife with technical, ethical and political landmines

America’s census looks out of date in the age of big data

Similar issues with the Canadian census, no doubt, and the debate over StatsCan accessing bank financial data is an illustration of the potential and the privacy and political roadblocks (globalnews.ca › news › statistics-canada-pause-plan-obtain-banking-re…Statistics Canada hits pause on plan to obtain banking …):

A DOG-SLED or a snowmobile is the surest way to reach Toksook Bay in rural Alaska, where Steven Dillingham, the director of America’s census bureau, will arrive to count the first people in the country’s decennial population survey on January 21st. The task should not take long—there were only 590 villagers at the last count, in 2010—but it marks the beginning of a colossal undertaking. Everyone living in America will be asked about their age, sex, ethnicity and residence over the coming months (and some will be asked much more besides).

This census has already proved unusually incendiary. An attempt by President Donald Trump to include a question on citizenship, which might have discouraged undocumented immigrants from responding, was thwarted by the Supreme Court. His administration has also been accused in two lawsuits of underfunding the census, thus increasing the likelihood that minorities and vulnerable people, such as the homeless, will be miscounted.

To truly modernize social programs, it will take big data and analytics

I may be biased given some of the obstacles I faced when working on “citizen-centred service” in the early days of Service Canada, where even conceptual work regarding integrating disparate programs and services from a pathfinder perspective faced resistance.

The complexity and coordination required, the organizational and even program stovepipes, and the sheer difficulty in developing and implementing such a change agenda make me a sceptic. After all, the government wasn’t even able to integrate pay services for its own employees with Phoenix, and has had less visible problems and challenges with Shared Services Canada.

But of course, better use of big data, and better capacity to analyse the data, offer considerable potential to assess the effectiveness of current programs, identify gaps and improve outcomes:

Finding better ways of wiring for e-government is important and necessary. Nobody would disagree with the need for better computing, electronic communications and information management.

However, digital improvements will bring about only modest gains if they are applied to programs that, at core, are based on pre-computer technologies, as is the case with most of today’s social and health programs. Transformative changes in program objectives and designs based on big data and micro-analytic tools must be brought into the picture. We need to create a trove of “what works” data that will lead to individually tailored social programming.

Most of today’s social programs were designed decades ago and, reflecting the limits of the technology of the day, provide eligible individuals with standard services, products or income supports that are designed to address specific problems.

For example, an unemployed person might be assigned to a training course with a fixed duration and curriculum. A low-income senior will be provided with a top-up pension in an amount that is predetermined based on the individual’s annual income in the previous year. Someone diagnosed with a particular disease will receive a prescription for specific drugs. These benefits are provided by a variety of independent programs funded by different orders of government — and are often delivered by staff in the social work, education or health disciplines. It is difficult to coordinate or even communicate across these programs, which is why they are often referred to as program silos. Individuals who receive benefits are seen as recipients, clients, patients or students, not as citizens or partners.

It’s a reasonably efficient system that works reasonably well for most people most of the time. On balance, the results are positive, near the average of other OECD countries.

However, the system is seriously showing its age.

The underlying weakness shows up most starkly in the way the system deals with the most vulnerable. People who are most in need of health and social services or income support often face multiple obstacles in life. They might lack several types of skills, have inadequate housing and poor jobs, have differing degrees of family support and financial assets, and face a variety of health, disability and addiction issues. People with multiple needs can face an almost impenetrable array of separate programs, each with different terms and conditions and offering solutions that are partial at best. Even with the help of experts and case managers, it is often impossible to create sensible combined packages of benefits to meet individual needs.

For decades, service providers and groups representing the vulnerable have pointed out the problem of trying to shoehorn people into this complex system of fractured supports and benefits.

And for many years, policy documents relating to education, health and social policy have called for a more holistic approach, with benefits directly tailored to the diverse needs of individuals. These have been referred to as student-centred, citizen-centric and, more recently, individually driven approaches. In health, related aspirations are often referred to as precision medicine or personalized medicine, where medical treatments, practices or products are tailored to the individual patient.

However, the called-for changes have not occurred. Experiments, demonstrations and other initiatives that have attempted to cut across the boundaries of the program silos have proven difficult to sustain and have typically had little impact on the design of mainstream programming.

There are good reasons for the lack of success in moving outside traditional silos:

  • Traditional programming makes it relatively easy to provide ongoing funding, to ensure ministerial accountability and to provide the high professional standards that can ensure, for example, the quality of health and educational interventions.
  • No organization has a mandate to develop interventions that cross these traditional program boundaries.
  • The empirical data needed to assess the effectiveness of tailor-made, holistic interventions are underdeveloped and are certainly not yet strong enough to create the needed accountability arrangements. Strong accountability regimes — the monitoring and evaluation activities that ensure that money is spent effectively, transparently and in line with intended objectives — are essential if reform is to be sustained.

But the solution is on the near horizon: big data and predictive analytics. They offer the opportunity for all citizens to become real partners in the design and implementation of the social and health programs that affect their lives. They can provide transformative gains, particularly for people who are most vulnerable.

This technology is in use in other applications and can be applied to social policy. I discussed it in an IRPP essay I wrote in 2015, The Enabling Society. At its core, individuals would have access to information at the very time they need to make big social and health decisions, to make well-informed choices about which combination of training, social services, housing, income supports and health interventions is likely to work best for them. This information would be calculated from large data sets that record the experience of people who have been in similar circumstances and had similar aspirations in the past. This technology produces information that allows all dimensions of the system to work in harmony:

  • The “what works” information would also be available to case workers, teachers, health professionals and other front-line staff so they can become partners in helping individuals put together flexible packages of interventions that are most likely to meet an individual’s particular needs and aspirations, including benefits provided by programs originating in different disciplines and orders of government.
  • The same information would provide the designers and administrators of the many independent traditional programs with the tools to make improvements steadily and automatically over time based on feedback loops that routinely describe which features of the program are working best and for whom — and at what cost.
  • The same information would also support rigorous accountability regimes both for existing program silos and for the flexible arrangements that provide individually tailored packages of interventions.

Such a system would result in huge gains on multiple fronts: in individual and social well-being, in effectiveness, in reduced cost, in the openness and accountability of public programs and in the ability of different orders of government to work together more harmoniously and in a way that treats citizens as main partners in shaping and delivering social programs.

A radically different approach along these lines, one that so dramatically changes the relationship between government and citizen, obviously cannot be attained overnight.

We should start small, in areas where mechanisms already exist to allow cooperation across jurisdictional and program borders and where the needed “what works” information is already well developed. There are a number of possible starting points.

For example, Employment and Social Development Canada could work with one or more provinces in introducing “what works” information into the daily operation of training and other employment programs on an experimental or demonstration basis under the authority of existing labour market agreements, which provide federal funding to support provincial and territorial employability initiatives. These agreements already allow considerable flexibility in the funding and development of innovative employment programs. As well, the needed “what works” data have already been developed and are already routinely used in the evaluation of these projects.

Once their practicality and effectiveness have been clearly demonstrated, these small initiatives could extend naturally and gradually to other areas and would, eventually, become the normal way we do business.

At the same time, the government of Canada should undertake a large-scale exercise to develop big data from administrative sources, such as anonymized information from tax files, employment insurance records and provincial training and social assistance files. It should also develop the associated analytic tools that will allow us to use these rich data to better understand individual behaviour and the kinds of social interventions that are likely to work best at the level of particular individuals.

Such fundamental but gradual changes in the purpose and design of programs need to go hand in hand with the deep reforms in digital processes that have been discussed in this Policy Options series, including a stronger capacity throughout all parts of government in the use of computers, electronic communications and information management. Process reforms could, in some cases, increase efficiency and improve service delivery and customer satisfaction. They could also provide somewhat better information about what programs and benefits are available and allow greater access to administrative data that have been collected. However, if such reforms are made in isolation, divorced from deep “what works” changes in program goals and designs, they risk creating expectations for change that cannot be met.

Those in the e-government community are not directly responsible for changing basic social policy directions or reforming the structure of social programs, but they can nevertheless play a pivotal role in the development and use of big data and predictive analytics. This might at minimum involve active support for reforms along the lines laid out in a paper by the Experts Panel on Income Security of the Council on Aging of Ottawa that describes the kind of micro-level data and microanalytic tools that are needed. Such support, along with process reform, could go a long way in finally enabling the real transition to the digital world.

Source: To truly modernize social programs, it will take big data and analytics

Identifying radical content online: Ryan Scrivens

I only wish we could use some of these analytical tools to better understand overall integration and the role that social networks play in either increasing integration or allowing individuals and groups to remain within their own community or group?

Violent extremists and those who subscribe to radical beliefs have left their digital footprints online since the inception of the World Wide Web. Notable examples include Anders Breivik, the Norwegian far-right terrorist convicted of killing 77 people in 2011, who was a registered member of a white supremacy web forum and had ties to a far-right wing social media site; Dylann Roof, the 21-year-old who murdered nine Black parishioners in Charleston, South Carolina, in 2015, and who allegedly posted messages on a white power website; and Aaron Driver, the Canadian suspected of planning a terrorist attack in 2016, who showed outright support for the so-called Islamic State on several social media platforms.

It should come as little surprise that, in an increasingly digital world, identifying signs of extremism online sits at the top of the priority list for counter-extremist agencies. Within this context, researchers have argued that successfully identifying radical content online, on a large scale, is the first step in reacting to it. Yet in the last 10 years alone, it is estimated that the number of individuals with access to the Internet has increased threefold, from over 1 billion users in 2005 to more than 3.8 billion as of 2018. With all of these new users, more information has been generated, leading to a flood of data.

It is becoming increasingly difficult, nearly impossible really, to manually search for violent extremists, potentially violent extremists or even users who post radical content online because the Internet contains an overwhelming amount of information. These new conditions have necessitated the creation of guided data filtering methods, which may replace the laborious manual methods that traditionally have been used to identify relevant information.

Governments in Canada and around the globe have engaged researchers to develop advanced information technologies, machine-learning algorithms and risk-assessment tools to identify and counter extremism through the collection and analysis of big data available online. Whether this work involves finding radical users of interest, measuring digital pathways of radicalization or detecting virtual indicators that may prevent future terrorist attacks, the urgent need to pinpoint radical content online is one of the most significant policy challenges faced by law enforcement agencies and security officials worldwide.

We have been part of this growing field of research at the International CyberCrime Research Centre, hosted at Simon Fraser University’s School of Criminology. Our work has ranged from identifying radical authors in online discussion forums to understanding terrorist organizations’ online recruitment efforts on various online platforms. These experiences have provided us with insights we can offer regarding the policy implications of conducting large-scale data analyses of extremist content online.

First, there is much that practitioners and policy-makers can learn about extremist movements by studying their online activities. Online discussion forums of the radical right or social media accounts of radical Islamists, for example, are rich with information about how members of a particular movement communicate, how they construct their radical identities, and who they are targeting — discussions, behaviours and actions that can spill over into the offline realm. Exploring the dark corners of the Internet can be helpful in understanding or perhaps even predicting trends in activity or behaviour before they happen in the offline world. If, for example, analysts can track an author’s online activity or identify an online trend that is becoming more radical over time, analysts may be in a better position to assist law enforcement officials and the intelligence community. At the same time, it is important to note that online behaviour often does not translate into offline behaviour; authorities must proceed with caution to ascertain the specific nature of an instance of online activity and the potential threat it poses.

Second, practitioners and policy-makers can gain valuable information about extremist movements by utilizing computational tools to study radical online activities. Our research suggests that it is possible to identify radical topics, authors or even behaviours in online spaces that contain an overwhelming amount of information. Signs of extremism can be found by drawing upon keyword-retrieval software that identifies and counts a specific set of words, or sentiment analysis programs that classify and categorize opinions in a piece of text. Large-scale, semi-automated analyses can provide practitioners and policy-makers with a macro-level understanding of extremist movements online, ranging from their radical ideology to their actual activities. This understanding, in turn, can assist in the development of counter-narratives or deradicalization and disengagement programs to counter violent extremism.

We must caution practitioners and policy-makers that our work suggests there is no simple typology or behaviour that best describes radical online activity or what constitutes radical content online. Instead, extremism comes in many shapes and sizes and varies with the online platform: some radical platforms, for example, promote blatant forms of extremism while other platforms encourage their subscribers to tone down the rhetoric and present their extremist views in a subtler manner. Nonetheless, a useful starting point in identifying signs of extremism online is to go directly to the source: identifying topics of discussion that are indeed radical at the core — with language that describes the “enemies” of the extreme right, for example, such as derogatory terms about Jews, Blacks, Muslims or LGBTQ communities.

Lastly, in order to gain a broader understanding of online extremism or to improve the means by which researchers and practitioners “search for a needle in a haystack,” social scientists and computer scientists should collaborate with one another. Historically, large-scale data analyses have been conducted by computer scientists and technical experts, which can be problematic in the field of terrorism and extremism research. These experts tend to take a high-level methodological perspective, measuring levels of — or propensity toward — radicalization or ways of identifying violent extremists or predicting the next terrorist attack. But searching for radical material online without a fundamental understanding of the radicalization process or how extremists and terrorists use the Internet can be counterproductive. Social scientists, on the other hand, may be well-versed in terrorism and extremism research, but most tend to be ill-equipped to manage big data — from collecting to formatting to archiving large volumes of information. Bridging the computer science and social science approaches to build on the strengths of each discipline offers perhaps the best chance to construct a useful framework for assisting authorities in addressing the threat of violent extremism as it evolves in the online milieu.

via Identifying radical content online

We all thought having more data was better. We were wrong. – Recode

Interesting set of arguments against the use of big data in all circumstances and the value of small, focussed data sets:

For years, the mantra in the world of business software and enterprise IT has been “data is the new gold.” The idea was that companies of nearly every shape and size, across every industry imaginable, were essentially sitting on top of buried treasure that was just waiting to be tapped into. All they needed to do was to dig into the correct vein of their business data trove and they would be able to unleash valuable insights that could unlock hidden business opportunities, new sources of revenue, better efficiencies and much more.

Big software companies like IBM, Oracle, SAP and many more all touted these visions of data grandeur, and turned the concept of big data analytics, or just Big Data, into everyday business nomenclature.

Even now, analytics is also playing an important role in the Internet of Things, on both the commercial and industrial side, as well as on the consumer side. On the industrial side, companies are working to mine various datastreams for insights into how to improve their processes, while consumer-focused analytics show up in things like health and fitness data linked to wearables, and will soon be a part of assisted and autonomous driving systems in our cars.

Of course, the everyday reality of these grand ideas hasn’t always lived up to the hype. While there certainly have been many great success stories of companies reducing their costs or figuring out new business models, there are probably an equal (though unreported) number of companies that tried to find the gold in their data — and spent a lot of money doing so — but came up relatively empty.

The truth is, analytics is hard, and there’s no guarantee that analyzing huge chunks of data is going to translate into meaningful insights. Challenges may arise from applying the wrong tools to a given job, not analyzing the right data, or not even really knowing exactly what to look for in the first place. Regardless, it’s becoming clear to many organizations that a decade or more into the “big data” revolution, not everyone is hitting it rich.

Part of the problem is that some of the efforts are simply too big — at several different levels. Sometimes the goals are too grandiose, sometimes the datasets are too large, and sometimes the valuable insights are buried beneath a mound of numbers or other data that just really isn’t that useful. Implicit in the phrase “big data,” as well as the concept of data as gold, is that more is better. But in the case of analytics, a legitimate question worth considering: Is more data really better?

In the world of IoT, for example, many organizations are realizing that doing what I call “little data analytics” is actually much more useful. Instead of trying to mine through large datasets, these organizations are focusing their efforts on a simple stream of sensor-based data or other straightforward data collection work. For the untold number of situations across a range of industries where these kinds of efforts haven’t been done before, the results can be surprisingly useful. In some instances, these projects create nothing more than a single insight into a given process for which companies can quickly adjust — a “one and done” type of effort — but ongoing monitoring of these processes can ensure that the adjustments continue to run efficiently.

Of course, it’s easy to understand why nobody really wants to talk about little data. It’s not exactly a sexy, attention-grabbing topic, and working with it requires much less sophisticated tools — think Excel spreadsheet (or the equivalent) on a PC, for example. The analytical insights from these “little data” efforts are also likely to be relatively simple. However, that doesn’t mean they are less practical and valuable to an organization. In fact, building up a collection of these little data analytics could prove to be exactly what many organizations need. Plus, they’re the kind of results that can help justify the expenses necessary for companies to start investing in IoT efforts.

To be fair, not all applications are really suited for little data analytics. Monitoring the real-time performance of a jet engine or even a moving car involves a staggering amount of data that’s going to continue to require the most advanced computing and big data analytics tools available.

But to get more real-world traction for IoT-based efforts, companies may want to change their approach to data analytics efforts and start thinking small.

Source: We all thought having more data was better. We were wrong. – Recode

How the parties collect your personal info — and why Trudeau doesn’t seem to mind: Delacourt

Great piece by Delacourt:

Numbers are definitely in fashion in the new Liberal government at the moment — and not just because the budget is landing next week.

A first-ever session on “behavioural economics” for public servants was filled to capacity last week, according to a Hill Times report. “Combining economics with behavioural psychology,” said PCO spokesperson Raymond Rivet, “this new tool can help governments make services more client-focused, increase uptake of programs, and improve regulatory compliance.

Better government through behavioural economics — the idea was popularized by the 2009 book Nudge and almost immediately adopted through the establishment of a “nudge unit” by the British government in 2010. Justin Trudeau’s government is already borrowing the concept of “deliverology” from the Brits, so the ‘nudge’ was never going to be far behind. President Barack Obama, Trudeau’s new best friend, also has taken steps to introduce nudge theory to the U.S. government in recent years.

But the real motivation for data-based governance in the Trudeau government may have come from a source much closer to home — the recent election, specifically the Liberals’ extensive use of big data to win 184 seats last fall. Make no mistake: Trudeau’s Liberals may have won the election by promising intangibles like ‘hope’ and ‘change’, but they sealed the deal with a sophisticated data campaign and ground war.

So now that the Liberals have seen how mastery of the numbers can help win elections, we probably shouldn’t be too surprised that they see those same skills as useful for governing as well. Big-data politics is here to stay.

What’s missing from that equation, however — at least on the political side — is privacy protection. Late last week, while everyone’s attention was fixated on Washington, federal Privacy Commissioner Daniel Therrien reminded a Commons committee that all the political parties are amassing data on voters without any laws to guard citizens’ privacy.

“While the Privacy Act is probably not the best instrument to do this, Parliament should also consider regulating the collection, use and disclosure of personal information by political parties,” Therrien told the Commons committee on access to information and privacy.
A little more than a year ago, it seemed that a new Liberal government could be expected to agree with the privacy commissioner.
Recall last year’s conference on “digital governance” in Ottawa; on stage for one panel discussion were key strategists for the three main parties — Tim Powers for the Conservatives, Brad Lavigne for the New Democrats and Gerald Butts for the Liberals. Mr. Butts is, of course, now Trudeau’s principal secretary.

Fielding questions from the audience, the three were asked whether political databases should be subject to Canadian privacy laws. Powers and Lavigne demurred; only Butts seemed to be saying ‘yes’.

Here’s his lengthy quote, which appeared a few weeks later in an iPolitics column by Chris Waddell:

“Let’s not kid ourselves, political parties are public institutions of a sort. They are granted within national or sub-national legislation special status on a whole variety of fronts, whether they be the charitable deduction, the exemption from access to information — all those sorts of things,” Butts told the conference.

“We have created a whole body of law … or maybe we haven’t. Maybe we have just created a hole in our two bodies of law that allow political parties to exist out there in the ether. I think that is increasingly a problem and it is difficult for me to envision a future where it exists for much longer.”

That was a year ago. And unless I missed it, there’s nothing in any of Trudeau’s mandate letters to ministers about new privacy laws for political parties. And without giving away too much about the new chapters of my soon-to-be-re-released book on political marketing, I didn’t get the impression during our recent interview that Prime Minister Trudeau was greatly troubled by the collision between privacy protection and political databases.

It seems odd to me that citizens can get (often appropriately) worked up about “intrusive” government measures, whether it’s the census or the C-51 anti-terrorism law, and yet be mostly indifferent to what the chief electoral officer has called the “Wild West” of political data collection.

Even Conservatives who resented the gun registry didn’t seem to mind that their own party was keeping track of gun owners in its database, so that it could send them specially targeted fundraising messages from time to time. That’s just behavioural economics, applied to the political arena.

So far, British Columbia is the only province to take steps to put political databases in line with privacy protection. The provincial chief of elections in B.C., Keith Archer, notified political parties that they would not get access to the voters’ list — the raw material of any political database — if they failed to comply with privacy laws.

That step could — and should — be implemented in Ottawa, too. We’re in the era of big-data politics and behavioural-insight governance, and Canadians are entitled to some accountability about the data the governing party is collecting and using on them.

Not so long ago, one of Trudeau’s most senior advisers agreed with that idea. Maybe all it takes is a little nudge.

Source: How the parties collect your personal info — and why Trudeau doesn’t seem to mind – iPolitics

Social Assistance Receipt Among Refugee Claimants in Canada: Evidence from Linked Administrative Data Files

A good illustration of the benefits to evidence-based policy making by linking administrative and economic data. Bit dry analysis but essentially shows that number accessing declines with time but remains about Canadian average:

Focusing on the middle estimate [which excluded non-linked files], the receipt of SA in year t+1 among the 2005-to-2010 claimant cohorts generally ranged between 80% and 90% across family types, with rates highest among lone mothers and couples with more than two children. Similarly, the incidence of SA receipt generally ranged from about 80% to 90% across families in which the oldest member was between 19 to 24 and 55 to 64 years of age. Across provinces, the incidence of SA receipt in year t+1 was generally highest in Quebec, at over 85%, and lowest in Alberta, at under 60%.

SA receipt varied considerably across country of citizenship. Refugee claimants from countries such as Afghanistan, Colombia, the Democratic Republic of Congo, Eritrea, and Somalia all had relatively high SArates (close to or above 90%) throughout most of the study period, while  rates were lower among refugee claimants from Bangladesh, Haiti, India, and Jamaica (generally below 80%).

The rates of SA receipt tended to decline sharply in the years following the start of the refugee claim. Between years t+1 and t+2, rates fell by about 20 percentage points among most claimant cohorts, declining a further 15 percentage points between t+2 and t+3, and 10 percentage points between t+3 and t+4. By t+4, between 25% and 40% of refugee claimants received SA. However, it is important to recall that these figures pertain to the diminishing group of refugee claimants whose claims remained open up to that year. These figures are also well above the Canadian average of about 8%.

Among refugee claimant families that received SA in year t+1, the average total family income typically ranged from about $19,000 to $22,000, with SA benefits accounting for $8,000 to $11,000—or about 40% to 48%—of that total.

In aggregate terms, SA income paid to all recipients in Canada totaled $10 billion to $13 billion in most years. Given their relatively small size as a group, the dollar amount of SA paid to refugee claimant families amounted to between 1.9% and 4.4% of that total, depending on the year and on the treatment of unlinked cases.

Source: Social Assistance Receipt Among Refugee Claimants in Canada: Evidence from Linked Administrative Data Files

Turning to Big, Big Data to See What Ails the World

Good examples of how big data can help identity the more important issues and the consequent shift in focus from death to disability:

The disconnect between what we think causes the most suffering and what actually does persists today. It is partly a function of success. Diarrhea, pneumonia and childbirth deaths have greatly declined, and deaths from malaria and AIDS have fallen, although far less dramatically. (The charts here show the stunning improvement in health around the world. And here are similar charts tracking progress in hunger, poverty and violence — a big picture that’s an important counterpoint to the constant barrage of negative world news.) This success is partly due to changes made because of the first Global Burden reports.

The downside is that longer lives mean people are living long enough to develop diabetes and Alzheimer’s.   “What decline we’re seeing from communicable diseases, we’re seeing a compensatory increase from diabetes,” Murray said.   And neurological diseases such as Alzheimer’s now account for twice as many years lived with disability as cardiovascular and circulatory diseases together, Smith writes.

This is not simply because people are living longer. It’s also a function of worsening diet everywhere, as poor societies adopt the processed foods found in rich ones.

The most surprising information, though, came not in measuring deaths, but disability. “Major depression caused more total health loss in 2010 than tuberculosis,” Smith writes. Neck pain caused more health loss than any kind of cancer, and osteoarthritis caused more than natural disasters. For other findings that may surprise you, see the quiz.

The report is a giant compilation of “who knew?”

Based on this information, countries and international organizations have been able to change how they spend their health resources, and some ambitious countries have done their own national Burden of Disease studies.

Iran, writes Smith, found that traffic injury was its leading preventable cause of health loss in 2003, and put money into building new roads and retraining police. It also targeted two other big problems its study found: suicide and heart disease.

Australia, responding to the high impact of depression, began offering cost-free short-term depression therapy .

Mexico was one of the countries making the most use of Global Burden of Disease data, after Julio Frenk became health minister in 2000.   Frenk had been Murray’s boss at the W.H.O., and a participant in Murray’s work. He found that Mexico’s health system was targeting the communicable diseases that predominated in 1950, not what currently ailed Mexicans. In response, Frenk established universal health insurance (before that, 50 million were uninsured) and set coverage according to the burden of disease.

The program covered emergency care for car accidents, treatment of mental illness, cataracts, and breast and cervical cancer — all of which had been uncovered, even for people with insurance. “You want to cover those interactions that give you the highest gain,” ]he said.

Murray and company have now branched out beyond diagnosis to measuring treatment: How many people really have access to programs like anti-malaria bed nets or contraception? How much is being spent and what does it buy? Where are the most useful points of intervention?   Meanwhile, data from the Global Burden reports  is seeping further into health policy decisions around the world — data that saves suffering and money and lives.

via Turning to Big, Big Data to See What Ails the World – NYTimes.com.

Research based on social media data can contain hidden biases that ‘misrepresent real world,’ critics say

Good article on some of the limits in using social media for research, as compared to IRL (In Real Life):

One is ensuring a representative sample, a problem that is sometimes, but not always, solved by ever greater numbers. Another is that few studies try to “disentangle the human from the platform,” to distinguish the user’s motives from what the media are enabling and encouraging him to do.

Another is that data can be distorted by processes not designed primarily for research. Google, for example, stores only the search terms used after auto-completion, not the text the user actually typed. Another is simply that many social media are largely populated by non-human robots, which mimic the behaviour of real people.

Even the cultural preference in academia for “positive results” can conceal the prevalence of null findings, the authors write.

“The biases and issues highlighted above will not affect all research in the same way,” the authors write. “[But] they share in common the need for increased awareness of what is actually being analyzed when working with social media data.”

Research based on social media data can contain hidden biases that ‘misrepresent real world,’ critics say

9 Ugly Lessons About Sex From Big Data | TIME

Interesting example of big data and some reminders that we are not yet living in a post-racial society:

5. According to Rudder’s research, Asian men are the least desirable racial group to women…On OkCupid, users can rate each other on a 1 to 5 scale. While Asian women are more likely to give Asian men higher ratings, women of other races—black, Latina, white—give Asian men a rating between 1 and 2 stars less than what they usually rate men. Black and Latin men face similar discrimination from women of different respective races, while white men’s ratings remain mostly high among women of all races.

6. …And black women are the least desirable racial group to men.Pretty much the same story. Asian, Latin and white men tend to give black women 1 to 1.5 stars less, while black men’s ratings of black women are more consistent with their ratings of all races of women. But women who are Asian and Latina receive higher ratings from all men—in some cases, even more so than white women.

8. Your Facebook Likes reveal can reveal your gender, race, sexuality and political views.A group of UK researchers found that based on someone’s Facebook Likes alone, they can tell if a user is gay or straight with 88% accuracy; lesbian or straight, 75%; white or black, 95%; man or woman, 93%; Democrat or Republican, 85%.

9 Ugly Lessons About Sex From Big Data | TIME.