Identifying Safe(r) Public Spaces for Women in Mexico City
May 17, 2021
Alejandra Cervantes, Gabriela Ríos, Itzel Soto
In every community, there are individuals or groups with uncommon behaviors who, while having access to similar resources, find better solutions to challenges than their peers. Finding these so-called positive deviants and promoting their solutions is referred to as the Positive Deviance approach. Building on this, the Data Powered Positive Deviance (DPPD) initiative combines traditional and non-traditional data to identify and understand positive deviance in new ways. As part of this initiative, the GIZ Data Lab, GIZ Mexico Project on 2030 Agenda, UNDP Mexico Accelerator Lab, and the University of Manchester Centre for Digital Development are running a pilot in Mexico City to find positively deviant public spaces that are safer for women. This blog post provides a step-by-step presentation of the process undertaken to identify and validate those potential positive deviants.
The article was first published on the Medium Blog of the Data Powered Deviance (DPPD) Initiative.
The Context: Violence Against Women, A Pressing Social and Security Challenge in Mexico
In Mexico, two-thirds of all girls and women above the age of 15 have reported experiencing at least one incident of violence in their lifetime. On average, 11 women are murdered on a daily basis. Between 2000 and 2019, most of the victims were women between the ages of 20 and 24. In 2019, more than 50 percent of female homicides occurred in public spaces.
Violence against women at the community level, which is perpetrated by an individual or a collective unknown to the victim, occurs in streets, parks, and, to a lesser extent, on buses, minibuses, or subways. The attacks that occur on the street are mainly sexual (66.8%) and include catcalling, bullying, stalking, sexual abuse, rape, and attempted rape. At the national level, 34% of women have experienced some form of sexual violence in public spaces in their lifetime. However, 78% of women and girls over 15 years old do not report these incidents. Mexico City is among the states with the highest rates of community violence against women in the country (61.1%).
This problem is relevant because it limits women’s freedom of movement and restricts their right to the city, which is stipulated in Mexico City’s Constitution (article 12). Additionally, it limits women’s access to work and education opportunities, access to essential services, their participation in cultural and leisure activities, and their full participation in public life.
It is in this context that we started a pilot to identify areas in Mexico City where women are safer — with a particular focus on public spaces — and explain why by using the Data Powered Positive Deviance method.
The Approach: Piloting the Data Powered Positive Deviance Method
One of the main challenges today is developing unconventional solutions to the increasingly complex problems of development. The Data Powered Positive Deviance (DPPD) method, by focusing on outliers or positive deviants, proposes to discover unusual practices and strategies that can successfully solve a problem. The application of this approach represents a starting point in the search for solutions for such a pressing issue — security for women — as it leverages the benefits of conducting quantitative and qualitative analyses.
We established a step-by-step method for discovering public spaces with better performance in terms of security for women, starting with mapping the relevant data sources, carrying out a homogeneous grouping, and defining our performance measure to identify positive deviants. Next comes the fieldwork to collect and analyze the positively deviant underlying factors. While the focus of the quantitative analysis is to identify positive deviants and few possible conditions that explain the positive deviant’s performance (e.g., number of security totems, police stations), the qualitative analysis will be concerned with understanding factors in the field that contribute to this outperformance.
In the upcoming subsections, we further develop the steps we took to implement the Data Powered Positive Deviance method (Figure 1) and share the key learnings we gathered along the way. The pilot is still in progress; however, the first steps have already yielded valuable lessons.
Conceptualization, Data Mapping, and Data Access
Violence against women is a complex problem because its configuration is sustained by a set of deeply rooted structural relationships of inequality between women and men. To better conceptualize the relationships that sustain violence against women, we needed to understand what it is, how it works socially, and what data can be used to explore the issue.
One of our first steps was to conduct a series of interviews with experts, including academics, activists, urban planners, and public officials with relevant experience on violence against women. This exercise provided us with initial guidance on the factors that make public spaces safer for women: urban infrastructure, security infrastructure, people, usage of space, and mobility. Furthermore, we identified some initial datasets for the analysis.
The next step was to extensively map public and non-public datasets on the aforementioned categories. We initially identified 67 possible datasets that could shed light on the problem, regardless of whether they were private or public or if their characteristics would make them suitable for the analysis. In other words, our initial mapping was a wish list of datasets that included urban infrastructure, population, commuting patterns, socioeconomic index, security, and justice. We tapped mainly into two sources: the Open Data Portal of Mexico City and the National Institute of Statistics and Geography (INEGI). We also mapped non-public datasets owned by public and private entities, such as mobile data, 911 reports, and usage of panic buttons.
The next step was to turn the data wish list into an actionable list. We identified the most important data from several datasets based on their relevance, level of aggregation, and when they were last updated. It was important to get information as granular as possible because we want to use geographic units of analysis that are small and precise enough to easily uncover the underlying factors behind positively deviant public spaces. We selected open data related to urban infrastructure (e.g., subway stations, bus stations), land usage, security infrastructure (e.g., location of panic buttons, cameras), census data, and marginalization indexes for our analysis. We also selected the Attorney General’s Office dataset which contains updated information on crime victims in investigation files in Mexico City.
Unit of analysis
We started by defining the unit for our analysis. We chose AGEBs, the basic geostatistical areas used in Mexico, which are made up of blocks delimited by streets, avenues, or walkways (Figure 2). We chose this unit of analysis because it offers a good level of granularity in order to have a sufficient number of observations — there are 2,431 AGEBs in Mexico City — and a manageable geographic area for when we conduct fieldwork and focus on public spaces. Additionally, most of the official statistics data we use are at the AGEB level.
Mexico City is an unequal city. The Municipal Human Development Report illustrates this: the borough with the highest Human Development Index (HDI) of Mexico City is very similar to Switzerland’s, whereas the lowest borough is closer to Colombia’s HDI. Even within the same borough there are stark contrasts, for example, in some areas it is not uncommon to find high-income gated communities right next to low-income neighborhoods.
When applying the DPPD method, one of the goals is to identify positive deviant practices that can be transferred to people, communities, or spaces with similar conditions. By doing so, the chances of these practices succeeding are higher. For example, transferring the same practices that work for AGEBs with a HDI similar to Switzerland’s to AGEBs with a HDI similar to Colombia’s might not be the best way to tackle insecurity in a context-aware manner. Simply put, we want to avoid comparing apples and oranges.
The process of grouping units of analysis with similar characteristics is called homogeneous grouping. What variables should we use to create homogeneous groups? Structural factors that are above the level of the individual — e.g., economic, policy, social factors — and are common to a geographic unit. These factors stay fairly static over time or are hard to change. In other words, they are not transferable to other units of analysis.
To do so, we chose structural variables that are related to the presence of crime: population density and socioeconomic conditions. The size of the AGEBs, the population that lives there, and the commuters they receive are not uniform across the city, so we wanted to group AGEBs with similar population and commuter numbers. Thus, we used population density and incoming daily trips of commuters.
To account for the socioeconomic conditions, we selected the Marginalization Index of Mexico’s National Population Council (CONAPO) because it summarizes several socioeconomic characteristics at the AGEB level, such as housing conditions, income, working population, and education level. We conducted a cluster analysis, which resulted in four groups (Figure 3) characterized by:
AGEBs with medium population density and a very low marginalization index that attract a very high number of people to work and study
Highly populated AGEBs with a low marginalization index that attract a high number of people to work and study
Highly populated AGEBs with a medium marginalization index and a medium level of people traveling there daily to work and study
AGEBs with a high marginalization index and a low population density that attract few people to work or study
To identify outperforming or positively deviant AGEBs, we needed to find a reliable measure of performance. In other words, we needed data that helps us identify which AGEBs record a lower number of crimes against women than expected. To this aim, we used the dataset of victims in investigation files of the Attorney General’s Office, which covers 2019 and 2020.
As we previously mentioned, there is a high number of crimes that go unreported. This is known as the dark figure, crimes that are not reported or that are not subject to a prior investigation and, therefore, do not appear in any statistics. We are aware of the limitation this poses, but this is the only existing dataset that offers the fields of information we needed to build the performance measure: 1) type of crime, 2) day of the week the crime occurred, 3) time of day it occurred, 4) age of the victim, 5) gender of the victim, and 6) geolocation of the occurrence. This last aspect is particularly important, as it makes it possible for us to associate crime numbers to their respective AGEBs.
The first step for building the performance measure was to define the universe of crimes for our analysis. Considering that the focus of our pilot is security in public spaces, we selected crimes that only happen in public spaces — those perpetrated against a passerby in a public setting — and gender-based violence crimes that could occur in public spaces, such as sexual assault and feminicide. We excluded domestic and family violence because the literature on the matter suggests they tend to occur in private spaces. We did this because, although the dataset contains the geolocation of the incidents, we cannot know if the incident happened in a public or private space.
It is important to understand the different types of violence against women because the psychological and physical impact of crimes varies greatly. It is not possible, for example, to equate feminicide to a robbery, and the conditions, including the characteristics of the physical space, that lead to them could be different. Therefore, after defining the universe of crimes for our analysis, we set out to divide them into categories according to their severity and impact on women. This resulted in three categories (Figure 4). In the category “Severity 1,” there are crimes that threaten life and aim to damage the physical and sexual integrity of women, while “Severity 2” includes crimes with physical and psycho-emotional violence but that do not threaten life. “Severity 3” refers to those where physical violence did not occur but may leave a psycho-emotional impact.
With these categories, we want to test if we can find different positive deviants within and/or across categories. For example, a set of positive deviants for “Severity 1,” the most severe crimes such as feminicide and rape, and another for “Severity 3,” the less severe crimes such as robbery to passerby without violence. If that is the case, when we conduct fieldwork, we could identify distinctive factors that are conducive to more or less severe types of crimes.
Positive Deviant Identification
The next step of the DPPD method is to identify the positive deviants through quantitative analysis. In the case of our pilot, we are identifying AGEBs that, while controlling for the relevant characteristics of these spaces, present a lower number of female victims than predicted. This means that these AGEBs have a better-observed performance than expected given their characteristics. The timeframe we are using for our analysis is 2019 and the first three months of 2020, before the COVID-19 lockdown started, impacting mobility patterns and usage of public spaces.
So how are we identifying positive deviants? We started by doing statistical modeling. Statistical modeling is necessary for finding positive deviants because we need to predict the performance measure, female victims in investigation files, with the model. Then, we must look at the residuals, the differences between observed and predicted values. If there are positive outliers in the residuals, we are dealing with a positive deviant. That is, there are positive deviants when the number of victims observed per AGEB is much lower than the number of victims predicted by the model.
For statistical modeling, we need to define the independent variables, those that are going to predict the number of female victims. In our statistical model, most independent variables are characteristics of urban infrastructure in public spaces and characteristics of the AGEBs’ population. For urban infrastructure, we included green areas, public transport stops, types of roads, amount of commerce, services, schools, hospitals, and other relevant services in the space. As for the characteristics of the population, we included age groups, female-headed households, percentage of economically active people, and percentage of non-formally educated people. We also included the variables used for the homogeneous grouping, as well as a variable to indicate if the AGEB is adjacent to the State of Mexico and the size of the AGEB. Our selection of the independent variables was informed both by expert interviews and past studies on crime in Mexico City.
After defining our independent variables, we performed three types of regression analysis. First, we ran a multiple linear regression. Then, we did a LASSO linear regression to perform variable selection to enhance the prediction accuracy and interpretability of the statistical model. Finally, we did a negative binomial regression because this type of regression is designed to fit models in which the performance measure consists of counts with overdispersion. We did the three-regression analysis for each one of the four clusters of the homogenous grouping per category of crime severity and for all crimes, so we have several results depending on the type of regression, cluster, and crime severity.
Among the key insights so far is that the most relevant variables identified in the three regressions were population size, AGEB area, financial services, restaurants, and bars per AGEB, as well as the distance to the closest Metrobús and metro station. This last variable has a negative relationship with the number of victims, which means that when the distance between the center of the AGEB and the metro station is shorter, a larger number of victims is expected. The three models explain on average 40% of the variance of the result variable — the performance measure — among the different clusters, providing a good starting point to define positive deviants. We are in the process of selecting the positive deviant AGEBs we want to understand further.
Once we finish the identification of the positive deviant AGEBs, we will validate them using various types of data that could help us explain whether or not they are positive deviants and what could explain their performance. For this step, we are considering data such as the location of police stations, and poles with panic buttons and cameras.
Then, we will conduct fieldwork. This will provide us with qualitative data on the causes or factors that explain the behavior of the public spaces with a disruptive and positive performance. Once we complete the ethnographic work, we will present the results to strategic partners to discuss their relevance and identify possible pilot interventions. The final result of the project will be evidence-informed recommendations for the improvement of public policies aimed at addressing violence against women in public spaces.
As we move forward with this pilot, we will share our progress in this space, so stay tuned and get in touch if you want to learn more.