Por Liliana Millán*
Today, talking about data science and artificial intelligence seems like talking about magic. So, let me use this space to clarify what role data science plays in public policy and adjust your expectations. To do this, I’ll use a specific example with a real problem that allows us to identify the different elements that enable data science to be used in public policies.
Data science in politics
Our example takes place in the city of Syracuse, New York, where the water department has been implementing different solutions for 40 years to preemptively identify which water mains are likely to break and cause a water leak.
Why is this important? Why is it a problem? The government’s reactive measures force Syracuse’s limited resources to be used for fixing current leaks that sometimes number as many as 200 breaks a year.
When fixing the breaks, the water that passes through those mains must be turned off, which has implications for the people who live in that area, as well as for schools, hospitals, and businesses.
In an ideal world, all mains would have preventive maintenance, but that requires a lot of money and human resources, both of which are scarce and impossible to increase.
The proposed data science solution is to use all available information on streets, properties, mains, and water characteristics to predict which city blocks are most likely to suffer a main break in the next three years without increasing the resources available to the city of Syracuse.
The solution performs better than current solutions implemented by Syracuse, obtaining 62% efficiency of 1% and 7% coverage of 1%. What does this mean?
There are 5,263 blocks in Syracuse, and the city has the resources to maintain only 52 blocks for three years, which represents 1% of the total blocks that can be maintained with current resources.
The data science solution is optimized to identify 52 blocks that it considers should be maintained because it’s highly likely they will experience a main break in the next three years.
The model has an efficiency level of 62%; that is, of the 52 blocks predicted to have mains that will break in the next 3 years, the model is correct on 32 of them. On the other hand, the model has 7% coverage; that is, the 32 model hits correspond to 7% of all the main breaks in the three years.
It may seem that the model’s performance is low. However, this compares to current solutions the city of Syracuse uses to prioritize the maintenance of 52 blocks, which have a maximum performance level of 48%, so the data science solution is better than what is currently in place and better utilizes the existing limited resources.
How data science helps us
It’s important to highlight four points of the data science solution:
1) It’s possible to predict with time to apply a strategy or policy. In this case, they’re predicting 3 years in advance, which allows the city of Syracuse to logistically organize solutions with the road department to perform preventive maintenance since they need to break up the pavement and then repave the road once the maintenance is done.
2) It’s possible to quantify the performance of the current solution to compare it with the proposed machine-learning solution.
3) It’s possible to optimize the solution based on the resource constraints.
4) Finally, it’s possible to give a range of options to decision-makers to determine the performance of the solution if the available resources are decreased or if the available resources are increased.
This last point is very useful for decision-makers since it allows them to unambiguously identify how much performance increases if resources are increased and how much it decreases if fewer resources are available, both of which are feasible scenarios in public policy.
You can read more about this use case in Using Machine Learning to Assess the Risk of and Prevent Water Main Breaks published in KDD 2018, August 19-23, 2018, London, U.K., as well as the video explaining the solution provided by DSSG, which the School of Government and Public Transformation’s Data Science Center is a part of.
*Liliana Millán, Director of the Artificial Intelligence Initiative at Tec de Monterrey’s School of Government and Public Transformation)