EN

|

ES

Search

When Data Science Stands on the Side of Justice: The Story of Paola Villarreal

Mexican data scientist Paola Villarreal, recognized by MIT, spoke with TecScience during the 2025 Monterrey International Book Fair, where she discussed the power of data analysis to drive social change.
Paola Villarreal
Villarreal has been honored with the MIT Innovators Under 35 LATAM award and the BBC’s 100 Inspiring Women list; she was also selected as a fellow at Harvard University’s Berkman Klein Center for Internet & Society. (Photo: Ricardo Treviño)

“When the Court handed down the ruling, they cited the analysis and mentioned my name. When I heard it, the lawyers turned around because they rarely name anyone. They said that thanks to that analysis, to that collaboration, they had been convinced they needed to throw out 95% of the convictions.”

That’s how Mexican data scientist Paola Villarreal recalls the moment in 2017 when the Massachusetts Supreme Judicial Court decided to overturn more than 22,000 drug-related convictions of Black and Latino individuals. She shared the story with TecScience just hours before presenting her book Artificial Intelligence: The New Electronic Brain at the 2025 Monterrey International Book Fair.

Through the Data for Justice project—using public records from the Massachusetts judicial system—this self-taught programmer revealed that many of the convictions were based on tampered evidence and cases of racial discrimination. She achieved this social impact through data science, the discipline that turns vast amounts of information into knowledge to help understand or solve complex problems.

The tool she developed for the project, the Augmented Narrative Toolkit (ANT), along with her analysis, earned Villarreal recognition from the Massachusetts Institute of Technology (MIT). In 2018, she was named to the Innovators Under 35 LATAM list by MIT Technology Review, where she was honored in the Visionary category. A year later, the BBC included her in its list of the 100 most influential women in the world.

Data Science as a Tool for Social Justice

In her book, Paola Villarreal presents artificial intelligence (AI) as one of the key tools in data science—from its origins and current applications, to the challenges it poses in terms of data privacy and ethics, and its potential uses in the society of the future.

For Villarreal, the most important aspect has always been the social and justice-oriented impact she’s achieved through data science. She shares that, over the years, more and more people regained their freedom. But Data for Justice wasn’t the only instance in which she saw the potential of analyzing and presenting information in a way that’s easy to understand to generate social change.

In Mexico, in 2019, while serving as Data Science Project Coordinator at the National Council of Science and Technology (now Conahcyt), she designed a tool called the National Information Ecosystem for the Search of Missing Persons (ENIPD, by its Spanish acronym). The tool was created to help researchers more effectively collect, generate, and analyze data related to missing persons.

“Social issues, human rights, and justice are where data science has the most potential—and where it’s most urgently needed. It helps us try to explain phenomena that, due to their complexity, would be impossible to analyze without the tools it offers,” she says. “In crises like that of missing persons, it can be essential in mitigating the damage, seeking justice for victims and society, and preventing it from happening again. Using data science to access our rights is the best use we can give—not just to data, but to technology as a whole.”

To build a project with strong social impact, she explains, there must first be multidisciplinary collaboration—not just with data scientists, but also activists, lawyers, journalists, researchers, and experts from civil society organizations. That’s what allows teams to understand the problem’s context, define objectives, and plan for long-term sustainability.

It’s also crucial to have a strong social purpose with a clear goal. In Data for Justice—a project supported by a Mozilla Foundation fellowship—Villarreal partnered with the American Civil Liberties Union (ACLU), an organization with over a century of experience defending civil rights against discrimination and unjust policies.

“My goal was crystal clear: to seek justice for those thousands of people. That gave me the drive to accomplish what the state attorneys didn’t want me to do, which was to analyze the databases of convictions, charges, arrests, and names I could connect not just numbers or codes, but actual people.”

Data Quality for Social Impact

In her book Artificial Intelligence: The New Electronic Brain, Villarreal emphasizes that data is as fundamental to AI as atoms are to matter. In that sense, having high-quality information is essential for training models and algorithms.

The same principle applies in any data science project, but there are major challenges when obtaining relevant information. For instance, data often isn’t directly available, requiring public records requests to access it.

According to the programmer, location data is among the most sensitive types of data. With it, it’s possible to track individuals over time and uncover patterns in their habits—something that proved useful during the COVID-19 pandemic to study whether there was communication or economic activity between municipalities, which in turn informed public policy decisions.

Other crucial data include poverty indicators, education levels, access to healthcare, and various statistics that, when combined, help provide a fuller understanding of the social context.

One of the biggest challenges—if not the most important one—according to Villarreal, is data normalization. This involves cleaning and standardizing the data so that different databases can be connected and interpreted together to create better matches and more representative samples.

That includes cases where names are written in multiple ways across different sources—with or without accents, with an “s” or a “z”—or where different identifiers are used for the same geographic areas, like postal codes and AGEBs (geographic units grouped by blocks). It’s also essential to translate analytical results into terms that are understandable for decision-makers, whether they’re judges, lawmakers, or members of the press.

Applying Data to Decision-Making

Still, in some cases, even when the information is well-presented and simulation models exist within universities or research centers, “they aren’t connected or in communication with decision-makers. That’s another problem—we need the data and the knowledge being generated to be tied to public policy and to have a real impact on people’s lives.”

In Boston, Villarreal was able to identify racial bias in the justice system, and she believes that a similar project could reveal important patterns in regions like Latin America.

“In Latin America, social class is closely linked to skin color and poverty. It’s the poor who have the least access to justice, education, housing, urban infrastructure, and transportation. Even the environment is affected by discrimination—displaced communities are often forced to live in areas where climate change puts them at greater risk. And when it comes to urban planning, the design of our cities reflects many historical biases.”

Today’s socially impactful projects can also draw on a growing array of technological tools that have evolved significantly over the years. For instance, Villarreal didn’t use AI as we know it today in Data for Justice. But now, its capabilities allow for the automated analysis of thousands upon thousands of documents, the identification of names, institutions, and other categories, and the development of models that can interpret and explain large datasets more clearly.

“AI opens up new ways to conduct science and research. With it, we’re going to encounter new problems, new challenges, and new opportunities. I’m very hopeful that we’ll continue doing real science with the help of AI—and that it will be of higher quality, more abundant, and more meaningful. That way, we can better understand every aspect of human life, but also the natural world, astronomy, and everything else we need to make sense of.”

Did you find this story interesting? Would you like to publish it? Contact our content editor to learn more at marianaleonm@tec.mx 

Related news

Did you like this content? Share it!​

Autor

Picture of Ricardo Treviño