THE CASE FOR DATA MERGING: GATHERING ACCURATE DATA IN FRAGILE CONTEXTS
In difficult-to-access geographical areas, up-to-date, reliable data is often unavailable. In such fragile zones, most data sources are outdated (for example, household censuses providing 10-plus-year-old data) and/ or highly aggregated (for example, population structure provided only on a macro/ national scale by age, sex, income, or refugee status). Security and resource constraints in these environments make third-party monitoring with granular data a time consuming, expensive, and sometimes impossible task.
GIZ projects in fragile states like Libya - particularly when remotely managed - rely on detailed information for planning, steering, monitoring, and evaluation, both internally as well as with external reporting to our commissioning parties. Currently, 65% of GIZ partner countries are classified as “fragile” while the number of respective projects
in these areas is increasing. Information is key in these dynamic environments, along with reliable ground/ granular data to ensure the quality of GIZ-provided services. Other implementing agencies are also actively prioritizing this challenge, however, a comprehensive solution is not yet in sight.
Conventionally, teams associated with such projects try to first assess the contextual situation by combining information from different (mostly qualitative) data sources, such as local NGOs, administrative representatives, and international agencies. Third-party monitoring consultancies provide regular updates on existing situations in conflict regions for the purposes of steering GIZ activities on behalf of the German Federal Ministry for Foreign Affairs. However, these reports mostly provide qualitative, rather than quantitative, data.
The Data in Fragile Contexts (DFC) experiment explores if and how several data sources (for example, satellite images, telecommunication data, contributor networks, social media monitoring, etc.) can be systematically merged/ cross-referenced in order to provide reliable, possibly real-time metrics for assessing basic public service delivery (such as education, water supply, and healthcare). Provided the right methodology and standards, additional layers can fill data gaps and further refine the information necessary to prove (or disprove) assumptions. For instance, a two-week-old satellite image might show the outline of a hospital but not its current mode of operation. Estimations based on outdated census data do not take recent dynamics (such as migration, conflicts, etc.) into account (World Bank, 2020). Instead of one-time situation reports, the DFC approach seeks to provide continuous context assessments. The combination of several data sources helps to provide better insights with a wide range of applications, for example:
- The completion of contextual information based on a particular project’s needs (particularly micro-level.
- Strong support for evidence-based decision making through the preparation of available data from external project partners, merging raw data when necessary and adding further data sources.
- The supplementation/ verification of findings from pre-scheduled household surveys.
- The generation of more accurate behavioral insights within a specific project’s context (such as perceptions and desires, root causes of specific problems, new trends, etc.).
- An overall increase in efficiency, resulting in a greater ROI per beneficiary.
During the experiment design phase, we have learned the following valuable lessons:
- Data collection in fragile countries is disproportionately expensive and digital tools offer the potential to substantially lower monitoring costs.
- Depending on the issue, context, ecosystem, and target group, a case can be made for harvesting and integrating readily available (open) data sources (such as WHO and UN data). So far, however, GIZ does not make use of publicly available data to the full extent possible.
- Telecommunication data (such as Call Data Records) is hard to obtain, and its acquisition carries implied reputation risks. For instance, conflict parties might perceive the utilization of telecommunication data (even if aggregated) as a violation of individual privacy.
- Due to factors like spatial resolution, unique attribution, and representativity, telecommunication data is of limited value in determining service coverage and reach.
- Thick Data (that is, the use of both qualitative and quantitative data) is key to determining contextual information, suggesting that a mixed-methods approach may provide the best solutions moving forward.