AI Training Data for Agriculture: An Experiment in Artificial Intelligence “Learning”

Potential countries: Burkina Faso
P4 Testing
GIZ Meissner Benin 20 v2
giz Meissner Benin 4
GIZ Meissner Benin 18


Feeding a growing world population with limited resources requires a fundamental transformation in agricultural production, including adjustments to the approach of smallholder farming systems in developing countries. Smallholder farmer productivity is low - in part due to factors such as climate change, etc. - while agricultural value chains and markets in developing countries remain largely informal and non-transparent.

Recent technological advancements in areas like earth observation, big data mining, and Artificial Intelligence (AI) can serve as driving forces in the push towards an inclusive agricultural transformation. The availability of high-resolution remote sensing imagery and crowdsourcing technologies (i.e. smartphone apps) enables unprecedented access to data-driven services that would have been unthinkable even five years ago.

Early evidence suggests these services increase smallholder productivity and profitability - mostly by providing better information - while alleviating poverty among smallholder families. Still, in order to unlock the widespread use of AI for the purposes of better service delivery in these contexts, “ground data” is needed to train machine learning algorithms.

 The data may contain such information as farmer profiles, land usage, crop yields, soil properties, and financial transactions. These types of data are key to connecting people in resource-constraint settings to the benefits of recent technological advancements. For this reason, they are incredibly important to the work of GIZ and its partners. At the GIZ Data Lab, we seek to assess the applicability of AI in settings of international cooperation, exploring to what extent GIZ project data is suitable for training machine learning models.


Since data availability is very limited in developing countries (particularly in informal sectors), the use of AI for service delivery to beneficiaries has only been applied on very few occasions and solely on a pilot basis. Our experiment, entitled AI Training Data for Agriculture tries to close this gap by showcasing the applicability of AI-driven delivery in the Global South. In it, we work to address the following question: How might we use GIZ project data to train an AI-based crop information system that helps to increase smallholder farm output?

The following Key Performance Indicators are important for AI-driven classification and prediction:

  • Crop system identification.
  • Crop area, planting density, and yield.
  • Crop condition and stresses (both biotic and abiotic).
  • Crop traits (i.e. soil, fertilization) under certain climatic and agronomic conditions.

The Data Lab explored and assessed existing agricultural GIZ-projects across the globe and chose to run its first pilot with data from the Competitive Cashew Initiative (ComCashew). ComCashew works along the cashew value chains in six countries within Sub-Saharan Africa (mainly Western regions), cooperating with approximately 170 private and public partners to reach more than 600.000 farmers so far. For measuring success, the project follows sophisticated monitoring and evaluation practices based on rich and comprehensive farmer survey data, including GPS-based field measurements.

Key Findings and future applications

In cooperation with  Dalberg Data Insights' project data from Burkina Faso (in which their team attempted to “train” a machine to make qualitative classifications such as crop type maps, along with quantitative forecasts like yield maps), ComCashew’s “ground data” provided information on both a cashew plantation’s geo-location and yield. The data aimed to answer the question: How much cashews are harvested at specific points in time, and on which plantation area?

Correlating this data with information from the European Space Agency, the algorithm “learned” how to differentiate between cashew and non-cashew areas just by “looking” at the satellite images. This application showcases how development cooperation projects may unlock a “hidden data treasure” by designing their data collection procedures in compliance with AI training data requirements - a finding with great potential for supporting future data-based projects in the agricultural development sector.

AI Training Data 2 v2

The experiment proves that, based on field survey data from GIZ, it is possible to produce accurate, large-scale crop maps while estimating corresponding yields. We regard this as early evidence in favor of our hypothesis: that the ground data GIZ is collecting carries an untapped potential for AI applications. The latter will have a great impact on how future solutions can effectively address development problems. For instance, AI applications may be used as an early warning system for crop failure and food crisis, giving governments and aid agencies more time to respond. However, the fruits of such applications can only benefit the rural poor appropriately if reliable and unbiased training data for such settings is made openly available.

Ultimately, the most important outcomes from our experiment are:

  • Increased awareness about the value of such data.
  • Evidence promoting a forward-looking data strategy and collection method.
  • An evidenced need for establishing governance that enables us to make the most of our data, as a company but also as a player in the international cooperation community.

As of today, one of the greatest GIZ training data “treasures” consists of clear historical data. The remaining (and probably much bigger) bounty would be the adoption of a corresponding data governance design - one which encourages the necessary adjustments in project data collection procedures. When streamlined globally and made widely available, the GIZ “training data treasure” is set to become a valuable “open data” treasure.