IOM-Microsoft Collaboration Enables Release of Largest Public Dataset to Bolster Fight Against Human Trafficking
Geneva/New York – The International Organization for Migration (IOM) today released a new synthetic dataset on human trafficking, made possible by innovative technology developed in partnership with Microsoft Research. This dataset represents the largest collection of primary human trafficking case data ever made available to the public, while enabling strong privacy guarantees that preserve the anonymity and safety of victims and survivors.
The downloadable Global Synthetic Dataset has been released through the Counter Trafficking Data Collaborative (CTDC) – the first global data portal on human trafficking – and represents data from over 156,000 victims and survivors of trafficking across 189 countries and territories (where victims were first identified and supported).
It provides first-hand, critical information on the socio-demographic profile of victims, types of exploitation, and the trafficking process, including means of control used on victims – all of which is vital information needed to better assist survivors and prosecute perpetrators. The new technology has enabled CTDC to share more data and allow more effective research to be conducted while protecting privacy and civil liberties. Access to additional attributes of victim case records will enable stakeholders to develop a more comprehensive understanding of this crime and the needs of survivors.
“Making data on human trafficking widely available to stakeholders in a safe manner is crucial to develop evidence-based responses,” said Harry Cook, Programme Coordinator at IOM’s Migration Protection and Assistance Division. “Administrative data on identified cases of human trafficking represent one of the main sources of data available but such information is highly sensitive. IOM has been delighted to work with Microsoft Research over the past two years to make progress on the critical challenge of sharing such data for analysis while protecting the safety and privacy of victims.”
Microsoft Research has worked with IOM to develop a new algorithm to derive “synthetic data” from CTDC’s sensitive victim case data. Rather than systematically redacting cases, which results in a substantial amount of data being suppressed, the algorithm generates a synthetic dataset that accurately preserves the statistical properties and relationships in the original data.
However, the records of the synthetic dataset no longer correspond to actual individuals and each is constructed entirely from common attribute combinations. This means that none of the attribute combinations in the synthetic dataset can be linked to distinctive individuals (or even small groups of distinctive individuals) in the sensitive dataset, or world at large. Representative data on all of CTDC’s victim of trafficking cases are now available as a downloadable data file thanks to the new algorithm.
“Creating a simple process for privacy-preserving data sharing has the potential to coordinate and amplify the efforts of anti-trafficking organizations around the world,” said Darren Edge, Director of Societal Resilience at Microsoft Research and project lead.
“We are grateful to IOM for our deep partnership in developing a new approach to data sharing that is grounded in the needs of the anti-trafficking community. By protecting the privacy and safety of victims with synthetic data, and empowering policymakers to view, explore, and make sense of data through rich interactive dashboards, we are showing one of the many ways in which research and technology can support the global fight against human trafficking.” IOM and Microsoft Research began working together in July 2019 as part of the accelerator programme of the Tech Against Trafficking coalition.
The new privacy-preserving synthetic data solution, developed at Microsoft Research in the Python programming language, is also being made freely available via GitHub. IOM aims to share the new technique with counter-trafficking organizations worldwide as part of a wider programme to improve the production of data and evidence on human trafficking. This includes establishing new international standards and guidance to support governments in producing high-quality administrative data, in partnership with the UN Office on Drugs and Crime, and a package of data standards and information management tools for frontline counter-trafficking agencies.
By making this information openly and safely available, IOM and Microsoft hope to ensure the voices of victims and survivors are heard and protected while empowering governments and other stakeholders to take progressive action to end this crime.
The new synthetic data and related resources can be accessed here.
CTDC is the first global data portal on human trafficking, combining victim case datasets from multiple counter-trafficking organizations.
For more information, please contact:
Harry Cook, Programme Coordinator at IOM’s Migration Protection and Assistance Division, Tel: +45 297 925 05, Email: firstname.lastname@example.org
This initiative is supported by the United States Department of State (DOS) Bureau of Population, Refugees, and Migration (PRM); United States Department of Labor (DOL); the Global Fund to End Modern Slavery (GFEMS) under a cooperative agreement with the U.S. Department of State; and the IOM Development Fund (IDF). The contents are the responsibility of the authors and do not necessarily reflect the views of DOS, DOL, GFEMS, or IDF.