Data Science for Human Rights
In essence, data science is a bag of tools at the intersection of mathematics, statistics, and computer science, used to harness a large a- mount of data for better-informed decision making. The same tools for commercial use to generate more pro- fits can also be used for more effective international development programs, humanitarian relief missions, and human rights protection endeavours.
Computer vision (how algorithms ‘see’ or classify images) for international development is one such example. Image classification is the pro- cess of taking a picture as input and outputting a class (like 'cat') or a probability that the picture belongs to a particular class ('there is 90% chance that this is a cat'). Rather than a picture of our favorite pets, we may instead feed the algorithm with satellite images of a particular region in the world and output some key development information of that region. A study presented by Professor Pandey, Agarwal and Krishnan in the 13th AAAI Conference on Innovative Applications of Artificial Intelligence did exactly that. Using multi-task convolutional neural networks (imagine layers of neurons perceiving an image in our brain), they were able to identify the material of roof, the source of lighting and drinking water as indicators of economic progress of the most populous state of India, Uttar Pradesh, with 96.9% of accuracy (Pandey, Agarwal, and Krishnan, 2018).
The implication is that governments might no longer need to completely rely on costly, inefficient and prone-to-error ground level survey da- ta to track economic progress of different regions for resource allocations. Organisations can also use publicly available satellite imagery to efficiently monitor and evaluate the effectiveness of development programs. The same technique can also be used to identify geographical features of a region for environmental conservation. For example, as part of the 2019 Woman in Data Science Conference, a global Datathon was hosted to challenge data scientists to detect oil palm plantations in high-resolution satellite imagery. My team (LSEarth) achieved an accuracy rate of 94% with a simple convolutional neural network on our first go (the winning team scored a staggering 99.957%)! The highly accurate auto- mated mapping of unlawful plantations and deforestation activities can serve as a powerful monitoring tool for governments and conservation groups to track and prevent violations of environmental policy. With some tweaking, the same set of algorithms can be used to forecast floods, detect plant disease, predict wildfires, prevent overfishing... As climate change continues to pose serious risks to the fundamental rights to life, these tools allow us to utilise unconventional sources of data to hold governments accountable and protect those most vulnerable to climate disasters.
Social Network Analysis and Natural LanguageProcessing
Social network analysis is the study of structures, represented by nodes (persons, organisations, villages, countries, etc.) and edges (interactions or relationships). It provides powerful tools to quantify and visualise abstract dynamics of communities and patterns of flows between sets of entities, such as the movement of people and goods between countries. For example, using data extracted from blog postings on the internet, the organisation Epodunk was able to map the spread of dis- placed survivors from New Orleans after the destructive Hurricane Katrina in 2005. Besides for its descriptive power, network analysis might also be used for early prevention of human rights abuses. A 2017 USAID report on human trafficking from Cambodia suggests that the social networks of victims, employers, and recruiters can help uncover underground human trafficking net- works and curb involuntary migration flows (Derks, Henke, and Ly, 2006).
Natural language processing allows us to make use of another unconventional source of data: human-gene- rated texts, such as news articles, political speeches, and tweets. For ex- ample, research by professor Barbera from the Department of Methodology at the LSE uses natural language processing techniques, such as sentiment analysis and topic models, to analyse the ideological leanings (left to right) of politicians and the spread of fake news on Twitter. The same techniques also allow researchers to detect deceptive reviews, hate speech, violent verbal abuses on social media in the linguistic level and provide evidence for prevention and law enforcement.
In addition to powerful algorithms, data science as a framework can also bring social impact. By framework, I mean the process of deciding what information is important, how to collect the key data, how to store it, how to update it, how to analyse it, and how to transform it into presentable and useful information (see visualisation on the right).
For grassroots non-profits, grants and donations are key for their mis- sions and survivals. To attract donors, a non-profit needs to be able to prove the impact of its work, and proving impact requires data. Through my personal project with the OneSky foundation - a nonprofit that brings nurturing responsive care to orphanages in China, I have seen first-hand how the knowledge of data- base system, statistical analysis, and data visualisation could transform a social good organisation. Previously, the organisation collected data about childhood development from more than 50 sites and aggregated them into excel sheets manually every quarter. This process cost the organisation a significant amount of time and human resources, yet at the same time produced errors and breaches of privacy. A database system would al-low officers on the ground to input and update real-time data and organise them in a format ready for analysis. With a better analysis and representation of data, OneSky and any social good organizations could not only boost fundraising efforts but also measure and improve the effectiveness of their programs, optimise resource allocation, and more, just like in any for-profit business.
For humanitarian relief missions, information is required in every stage of the management cycle. To ensure sufficient resources for allocation, we need to know how many people to feed, how many people need medical attention, how many children need urgent care, etc. To secure future sup-port, we might also want to predict how many more people will be in need of humanitarian assistance in the future when a conflict or climate disaster draws close. A well-designed data pipeline that allows real-time inputs is crucial for the success of humanitarian emergency relief.
In the same vein, data is crucial for human rights protections. By aggregating different data sources and navigating database systems, the Hu- man Rights Data Analysis Group (HRDAG) is able to identify unique records of killings, human right a- buses, disappearances or tortures of individuals in conflict-affected counties like Syria and Chad. Reports backed up by careful data analysis enabled the HRDAG to uncover human rights abuse cases that would otherwise be hidden. The data that they are able to extract from various sophisticated data systems have been the core evidence for the international community to hold human right abusers accountable. For example, a 2009 Human Rights Data Analysis Group report found that for- mer Liberian president Charles Taylor - who was tried and found guilty in The Hague for war crimes and crimes against humanity in Sierra Leone’s civil war - also led the Liberian rebel group responsible for the largest number of violations during Liberia’s 24 years of civil unrest (Cibelli et al., 2009).
Data science is changing the face of human rights and plays an essential role for their protection. Early detection, early understanding, and early decision and analysis are essential to prevent or mitigate human rights violations, crises and disasters. Data science provides us with the technological capacity and tools to gather, analyse, and portray data vital to the protection and enforcement of human rights.
Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting From Left to Right: Is Online Political Communication More Than an Echo Chamber? Psychological Science, 26(10), 1531–1542. https://doi.org/10.1177/0956797615594620
Cibelli, K., Hoover, A., and Krüger, J. (2009). Descriptive Statistics From Statements to the Liberian Truth and Reconciliation Commission. HRDAG. https://hrdag.org/wp-content/uploads/2013/02/Benetech-TRC-descriptives-final.pdf
Derks, A., Henke, R., and LY V (2006). Review of a Decade of Research On Trafficking in Persons, Cambodia. The Asia Foundation and USAID. https://asiafoundation.org/resources/pdfs/CBTIPreview.pdf
Epodunk. Flow of Katrina survivors. Retrieved from http://www.epodunk.com/top10/diaspora/
Marazzi, A. (2016). The data pipeline architecture. Medium. Retrieved from https://medium.com/the-data-experience/building-a-data-pipeline-from-scratch-32b712cfb1db
Nazrul, S.S. (2018). Towards Data Science. Retrieved from https://towardsdatascience.com/data-science-interview-guide-4ee9f5dc778
Pandey, S.M, Agarwal, T., Krishnan, N.C. (2018). Multi-Task Deep Learning for Predicting Poverty from Satellite Images. The Thirtieth AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-18). 7793-7798. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16441/16388
Tata, V. (2017). Simple Image Classification using Convolutional Neural Network--Deep Learning in python. Medium. Retrieved from https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python-totally-from-a-beginners-perspective-be8dbaf22dd8
Volkova et al. (2019). Overview of social media analytics capabilities developed. Retrieved from https://www.cs.jhu.edu/~svitlana/