Diving into data at the AI@Sustainability Hackathon

Experiment 4 Data Powered Positive Deviance
P3 Prototype

July 6, 2020
Karim Soubai

In our earlier posts we explored the possibilities that the Data Powered Positive Deviance (DPPD) initiative possesses in various contexts, one of which is the identification of Positive Deviants during the COVID-19 pandemic
As a reminder, Data Powered Positive Deviance operates under the assumption that in every society, there are communities that, under the same conditions, act significantly better than others. Almost a year ago, the GIZ Data Lab began to passionately investigate such phenomena throughout the world. The desire of the team to proactively combat the Covid-19 pandemic, resulted in the creation of a preliminary machine learning model during the “WirVsVirus Hackathon”. This model was hoped to “act much like a 'crystal ball' in identifying the “rules” connecting structural factors to infection rate.  
Having had this excellent experience, we were very happy to join the AI@Sustainability Hackathon powered by the Siemens AI Lab (June, 24th-27th) to again collaborate with brilliant data experts from all over the world to further investigate the possibilities DPPD possesses to identify Positive Deviants in Germany. Identifying them can help other, comparable communities to tackle the infection outbreak and smoothen the curve. The creation and further development of such a model is especially relevant now, as many scientists expect a second wave of the corona pandemic.

The Siemens AI@Sustainability Hackathon

For 72 hours the most promising 50 out of over 250 applicants received the chance to contribute to and build rigorous machine learning solutions for profound sustainability questions ranging from clean water, mobility to health. Though prices for technological excellence, sustainability and overachievement were awarded, the whole event focused on collaboration rather than competition. Thus, each challenge was approached by a group of 10 hackers, who stayed connected over the whole event and supported each other to find the best possible solution for the posed challenge. In this regard, upholding a cooperative dynamic during these intensive three days was particularly demanding as the whole event was held virtually – a task, which was very well handled by the team. 

The Challenge

The challenge laid in front of the hackers was rigorous and innovative and the data gave many possibilities for exploration. For the first time we used time dynamic mobility data supplied by Teralytics - whom we want to thank for the collaboration - covering mobility within and across Germany districs. Furthermore, two of our very experienced affiliated data scientists, Joschua Driesen and Fynn Withake, prepared this rich data set as well as found additional open source weather data, that was implemented into the prediction model.   
The challenge comprised creating and training a machine-learning-model, utilizing static structural data, such as demographics or socio-economic status, as well as time-dynamic data - encompassing weather and mobility data - that could be able to precisely predict the Covid-19 infection spread in over 400 German districts. In a further step, those districts, that had significantly lower infection rates than predicted by the model, were considered possible Positive Deviants.  

You haven´t heard? The team solving the GIZ Data Lab challenge won the price for the most sustainable solution. But how did they do it?

In order to approach this challenge, two sub-teams emerged that aimed at creating a static model and a time-dynamic model, respectively. Various models from linear regression models, random forest methodologies and neural networks where used and compared through high standard quality metrics.  
Utilizing the two most precise static models, individual districts where identified in which the actual behavior was significantly better than our predictions would suggest. Through incorporating the time-dynamic weather and mobility-data, the team was able to even better predict the infection rate in a respective district. Furthermore, it was found that weather and mobility influenced the infection spread mostly after three to four days. In the graph below, the green line shows the predicted infection rate in the respective district, whereas the red line represents the actual rate. As we can see from these two graphs, we are now able to identify time-periods in which districts face the pandemic better than expected (i.e. when the green line is above the red line). In the two concrete examples below, these communities showed positive deviant behavior almost the whole duration of May. Temporal consistency - with regards to positive deviance of certain districts - might be a further indicator that these are indeed positive deviants and not only influenced by arbitrary, short term effects. 

graph blog

These findings of the trained models prove and imply two noteworthy aspects. First, they prove that our behavior has an influence on the intensity of the Covid-19 pandemic, and we are not doomed to passively react in accordance with the spread of the infection. Second, having a trained model that can predict a district´s infection spread based on weather and mobility, will allow us to monitor the within-Germany traffic (especially in the summer tourism season) in real time. 

What is next?

The Corona pandemic is still very much present, and everyone is trying to find solutions as to how to best limit new infections, while being able to sustain as many processes in society as possible.  

The outlined models enable us to truly identify which comparable district performed better. Of course, through these new possibilities, the GIZ Data Lab Team is even more motivated to contribute to developing this prototype further so it can be widely used. 

We will consolidate and refine the developed models in order to validate the found results and consistently identify new Positive Deviants. In a second step, we hope to be able to qualitatively assess what the found districts did significantly better, so that comparable districts can implement the learnings. Understanding the behavioral sources of positively deviating outcomes creates the core benefit of this approach and could possibly make a difference in this pandemic. 

One of the Jury members of the Siemens AI Hackathon stated: 

“You really have the responsibility to make what the team has created work, and to bring the impact that we all hope that this solution can bring.”  

The whole GIZ Data Lab team is excited to achieve as much as possible in this regard, not only in Germany, but, if possible, in other contexts around the world