P5 Scale

March 15, 2023
Erik Lehmann

Smiley v2

Generative AI a International Development Perspective

Artificial intelligence is currently filling the headlines. This raises the question of how we can use the benefits of these tools without causing harm and leaving no one behind.

OpenAI, leading the current hype, has brought large language models (LLMs) like GPT-3 (Generative Pre-trained Transformer) to the mainstream conversation. Only six months ago, these advanced AI systems were primarily recognized by experts in the field. Since its public debut in November 2022, ChatGPT has captivated a global audience, demonstrating the immense potential of artificial intelligence to millions of users. The internet is teeming with examples of how this cutting-edge technology can be utilized and challenged. Just yesterday, GPT-4 was announced as even larger and more advanced than its predecessor, boasting "human-level performance on various professional and academic benchmarks." Big tech like Google and Facebook, startups such as Anthropic, and open-source communities are all vying for supremacy in the AI race. As the quest for peak performance intensifies, we must ensure that we don't inadvertently exclude a significant portion of people in the process.

The basics

Generative AI is the new frontier of artificial intelligence that involves using machine learning models to generate new content such as images, music, gene structures, or text. It learns the knowledge from existing data sources and uses it to create new content that is similar in style and structure, yet entirely genuine. Foundation or Large Language Models (LLMs), a key aspect of generative AI, are pretrained on enormous amounts of data and serve as the basis for developing applications across various domains.  GPT-3 was trained on vast amounts of text from various internet sources, including books, Wikipedia, and social media, and serves as a foundation model that, when fine-tuned for conversation, gives rise to ChatGPT. Incorporating this into a user-friendly interface has made AI accessible to a broader audience, allowing individuals with diverse backgrounds and skill levels to benefit from its capabilities.

It can answer questions, summarise text, and create creative things like poems or song lyrics in many different languages. This includes both everyday languages and programming languages. Also, it can change its answers to fit different topics, making it useful for many purposes and subjects. Indeed, it has also been beneficial in improving the articulation of this blog post.

ChatGPT acquires linguistic patterns and stores knowledge by analyzing extensive text collections, similar to a condensed version of the internet. However, some details are lost, causing the model to make estimations, which can lead to false information and convincing "hallucinations" that can only be detected through expert knowledge and careful proofreading.

It is crucial to be cautious with the input provided to ChatGPT, as it is a third-party application and any data entered is sent through its servers. Consequently, handling personal,  sensitive or proprietary information might not comply with EU GDPR guidelines, so users should be mindful of data privacy concerns.

In short: It's your calculator for the language essay, a partner for brainstorming, the template for your blog post. It is not an encyclopedia and not your friend or colleague you would share secrets with. It can be your assistant, not your solution.

When integrating AI into your own application, starting with a foundation model is advisable. While GPT-4 may be the Formula 1 car of language models, showcasing the best performance and advertising the brand OpenAI, it might not suit your roads.  Your specific use case might require a more tailored and sustainable solution, like a Toyota Corolla, which offers reliability and versatility for everyday tasks.

quote v2

Hidden beneath the surface

The podcast's sub-headline introduces Lelapa AI, a newly established African AI research and product lab. When pasting this text into office tools, the founder's name is underscored in red, indicating that her name does not belong. The AI spell checker is predominantly trained to recognize Western names. This underscores Lelapa AI's drive to combat biases against Africans. This problem isn't limited to Africa, since many current language models are predominantly trained on Western datasets and languages, possibly perpetuating biases. It's crucial not to leave this responsibility to the affected parties, but to tackle these biases in order to create AI systems that are inclusive and diverse, benefiting individuals from all cultural and social backgrounds and languages. By expanding the range of training data and including diverse perspectives, we can strive to develop AI tools that are fairer and more representative of the worldwide community. 
Fostering transparency in both models and data utilized is vital for attaining a more inclusive and diverse AI landscape. Open access to information enables better understanding, collaboration, and improvement, ultimately contributing to the development of equitable AI systems that cater to a wider audience. GPT-4 is the most secretive release OpenAI has ever put out, we hardly know anything about the data used, model size, or similar, marking its full transition from nonprofit research lab to for-profit tech firm.

The Covid chatbot, utilized millions of times, demonstrates how a solution can be tailored to fit the local context by accommodating local infrastructure, collecting and sharing data, and fostering community development. A collaboration between Mozilla Common Voice, the Rwandan start-up Digital Umuganda, the Rwanda Information Society (RISA), and the German Development Cooperation (GIZ).

FF v2

Generative AI has an impact on our lives, whether we choose to use it or not

Large Language Models (LLMs) inherently carry certain beliefs and cultural values, primarily derived from the data they have been trained on. Consequently, these models have the potential to influence political discourse, shape public opinion and affect societal dynamics. Additionally, LLMs can be exploited to generate malicious content and misinformation, posing a significant challenge in the digital space. By creating convincing yet false narratives, these models can contribute to the spread of disinformation, making it increasingly difficult for individuals to discern truth from falsehood and exacerbating existing problems in online environments. In education, they can enable hard-to-detect plagiarism and generate misleading information, undermining academic integrity. In the job market, LLMs may displace certain roles, particularly those that involve repetitive or text-based tasks, potentially exacerbating the gap between skilled and unskilled workers and creating new challenges for workforce adaptation and development. OpenAI delegated the data labeling process to workers in Kenya, who were tasked with evaluating texts that included discriminatory or violent content. Although this work can be emotionally distressing, it remains the least expensive aspect of the process,  raising concerns about the potential exploitation of labor in the development of AI systems. Lastly, the energy consumption associated with AI systems is a significant concern. Training and operating large-scale AI models, like LLMs, often require immense computational power, leading to increased energy usage and raising questions about the environmental sustainability of these technologies in the long term.


“We’re seeing these AI capabilities move very fast and I am in general worried about these capabilities advancing faster than we can adapt to them as a society,”

Jess Whittlestone, head of AI policy at UK think tank The Centre for Long-Term Resilienctold The Verge.


"As proprietary AI systems built behind closed-doors are going into the mainstream, the fact that a very few organizations and individuals (mostly privileged ones) understand how these systems work or what they've been trained on will create massive harm and risks.

I think we can significantly contribute to change this with our open-source, the hub for open models, datasets, demos and papers, our science reproduction and education initiatives and more."

Clem Delangue, Co-founder & CEO at Hugging Face on LinkedIn

Generative AI will influence our societies and tools like ChatGPT can support the achievement of the SDGs. However, it is crucial to be aware of and understand the limitations of this technology. While OpenAI has gained substantial attention as a notable example of generative AI, there are numerous alternative solutions that might be more sustainable and inclusive, depending on the specific problem. In the realm of global collaboration, the promotion of inclusivity demands a commitment to maintaining a high level of transparency. As organizations, we must delve deeper into understanding the data and infrastructure requirements for utilizing AI technologies effectively and responsibly. This involves not only recognizing potential biases and limitations but also actively promoting sustainable applications, fostering collaboration, and empowering local communities to participate in the AI-driven digital landscape. By doing so, we can bridge the digital divide and ensure that the benefits of AI are accessible to a diverse range of individuals and communities across the world. It also enables us to better anticipate societal shifts and navigate them through well-informed policies and strategies.

At the data lab, we have been researching and experimenting with new technologies for GIZ. We consider it our responsibility to disclose and document the data and models we generate, as well as share the insights we acquire and promote the use of open-source development. We are eager to investigate the potential of generative AI in development cooperation and remain open to collaborative opportunities, considering that an eTata may also be the one to generate the most significant impact.

Join us on this journey of discovery!

eTata  v2