The Architecture of GPT-3 and How to Think About it in 2023 – PwP Episode 31

Challenges

  • Understanding the complex architecture of GPT-3.
  • Integrating GPT-3 capabilities into existing systems.
  • Addressing ethical considerations in AI deployment.

Solutions

  • Breaking down GPT-3’s components for better comprehension.
  • Developing APIs to facilitate seamless integration.
  • Establishing guidelines for responsible AI usage.

Benefits

  • Enhanced natural language processing capabilities.
  • Improved user interactions through advanced AI.
  • Competitive advantage by leveraging cutting-edge technology.

In this episode, Jeffrey discusses the architecture of GPT-3, the technology behind ChatGPT, and how you should think about this technology in 2023.

Situation – ChatGPT is getting a lot of press because it’s the first freely available implementation of GPT-3 that has captured the imagination of the masses. Many are pointing out the awesome and surprising capabilities it has while others are quick to point out when it provides answers that are flat-out wrong, backward, or immoral.

Mission – Today I want to raise up the conversation a bit. I want to go beyond the chatbot that has received so much press and look at the GPT-3 technology and analyze it from an architectural perspective. It’s important that we understand the technology and how we might want to use it as an architectural element of our own software systems.

Execution Introduction – GPT-3, or Generative Pretrained Transformer 3, is the latest language generation AI model developed by OpenAI. It is one of the largest AI models with over 175 billion parameters, and it has been trained on a massive amount of text data. GPT-3 can generate human-like text in a variety of styles and formats, making it a powerful tool for natural language processing (NLP) tasks such as text completion, text summarization, and machine translation.

Architecture of GPT-3

The GPT-3 architecture is based on the Transformer network, which was introduced in 2017 by Vaswani et al. in their paper “Attention is All You Need”. The Transformer network is a type of neural network that is well-suited for NLP tasks due to its ability to process sequences of variable length.

The GPT-3 model consists of multiple layers, each containing attention and feed-forward neural networks. The attention mechanism allows the model to focus on different parts of the input text, which is useful for understanding context and generating text that is coherent and relevant to the input.

The feed-forward neural network is responsible for processing the information from the attention mechanism and generating the output. The output of one layer is used as the input to the next layer, allowing the model to build on its understanding of the input text and generate more complex and sophisticated text.

Using GPT-3 in a C# Application
To use GPT-3 in a C# application, you will need to access the OpenAI API, which provides access to the GPT-3 model. You will need to create an account with OpenAI, and then obtain an API key to use the service.

Once you have access to the API, you can use it to generate text by sending a prompt, or starting text, to the API. The API will then generate text based on the input, and return the output to your application.

To use the API in C#, you can use the HTTPClient class to send a request to the API and receive the response. The following code demonstrates how to send a request to the API and retrieve the generated text:

“` using System; using System.Net.Http; using System.Text;

namespace GPT3Example { class Program { static void Main(string[] args) { using (var client = new HttpClient()) { client.BaseAddress = new Uri(“https://api.openai.com/v1/”);

var content = new StringContent(“{\”prompt\”:\”Write a blog post about the architecture of GPT-3\”,\”model\”:\”text-davinci-002\”,\”temperature\”:0.5}”, Encoding.UTF8, “application/json”); content.Headers.Add(“Authorization”, “Bearer API_KEY”); var response = client.PostAsync(“engines/davinci/jobs”, content).Result; if (response.IsSuccessStatusCode) { var responseContent = response.Content.ReadAsStringAsync().Result; Console.WriteLine(responseContent); } } } }
} “`

End of demo
From the start of this explanation, the text was generated by chat.openai.com. It can be pretty impressive. But, at the same time, it’s very shallow. GPT-3 is a machine learning model that has been trained with selected information up to 2021. Lots of information, but selected, nonetheless. Here is the actual ChatGPT page that generated this. Notice that it admits that it doesn’t have information past 2021.

Let’s dig deeper, though on what GPT-3 is and how it came to be. Let’s look at the theory behind it so that we can see if we should use it as an architectural element in our own software. – Let’s go back to 2017. Ashish Vaswani and 7 other contributors wrote a paper called “Attention Is All You Need”. In it, they proposed a new network architecture for their neural network. Simplify that and think of a machine learning model. They create a method that could be significantly trained in 3.5 days using eight GPUs and be ready for a complete transition from one spoken language to another. They tested it using English-to-French and English-to-German. Vaswani and other contributors were from Google Brain, four from Google Research, and one from the University of Toronto. – In 2018, four engineers from OpenAI wrote a paper entitled “Improving Language Understanding by Generative Pre-Training”. They lean on Vaswani’s paper and dozens of others. They came up with a new method for Natural Language Processing (NLP). They describe the problem of training a model with raw text and unlabelled documents. That is, if a model is trained by all available information in the world, it’s a mess. Culture divides the world, and all queries posed to an ML model are in the context of culture. We have geographic culture, national culture, religious culture, trade culture, and more. And existing models have to painstakingly label all data before being fed into the model or they get mixed in with everything else. For example, take users in different countries as a stark example. In the US, where 70% of the population claim Christian as their religion according to the latest 2020 survey, if users receive answers condemning Christianity or criticizing it, that would be a poor user experience. In Afghanistan, however, where it is illegal to be Christian, the users would have a poor user experience if the model returned answers showing Christianity in a positive light. So from an architectural perspective, it’s important to understand what GPT-3 is. Remember. It stands for Generative Pretrained Transformer 3. Pretrained is key. There are now several doesn’t online services that have implemented GPT-3 and have trained a model. Text drafting and copyediting are already becoming popular. Video editing is growing. Understand that by taking on a dependency on one of these services, you are relying on them to train a model for you. That alone is a lot of work but can save you a lot of time. But inquire about the body of data that has been fed into the model so that you can make sure your users receive the experience you want for them. I gave one example of cultural differences between countries. But for software geared for children, there is a mountain of information on the Internet that you don’t want to be in the model if it’s generating responses for kids. Keep that in mind. ChatGPT has had to have bias injected into it because bias seems to be a more human trait than a computer trait. Time Magazine did a write-up on how OpenAI invested in a project to label and filter data used to train the model. In short, it was a big filtering operation. There is a lot of filth on the NET, so according to your own morality (another word for bias), that’s a good thing. But I’m sure you will also find some areas where they inserted bias that you don’t agree with. Again, it’s all about training the model with labeled data that fits the culture of the users. Early users are circulating answers that seem fishy and serve as examples of the filtering project OpenAI commissioned. ChatGPT can draft blog posts and short statements as well. That’s pretty cool. I’m Italian. My family immigrated from Sicily in 1910 to Texas, so I love this first example. “Write a tweet admiring Italians”. The response is “Italians are a true inspiration – their rich culture, stunning architecture, delicious cuisine, and effortless style make them a marvel to admire 🇮🇹 #AdmiringItalians”

Wow, quite flattering. Then, you just go down the list and throw in some other races. The trainers of the ChatGPT model labelled data favorable to Italians, Asians, Hispanics, Indians, Blacks, and Whites. But it seemed to have a problem with that last one. So we can see that the model definitely has some different training there. Architecturally, you need to decide whether a 3rd party model out there is a fit for your software or whether you need to train a model that fits your users needs more specifically.
Let’s move on.

OpenAI is very well capitalized, and I expect very interesting things from them. Microsoft has announced an investment of $1B in 2019 into the company. With an investment like that I would also expect OpenAI technology to be well integrated with Microsoft Azure and .NET development tools. Microsoft has been expanding Machine Learning capabilities for a long time, but GPT-3 is groundbreaking. You can train models like ChatGPT in four days with eight GPUs and be ready to start testing. For some of you, you’ll just want to call http APIs of some of the GPT-3 services. For others, you’ll want to implement and train your own model so that you can label the data that is being fed into it to guide responses.

Elephant in the room: Is GPT-3 going to replace me as a programmer? Short answer: No. I’ve been around long enough to have seen every decade have a story of “this technology will make programmers obsolete”. It hasn’t happened, and it’s not going to happen. The same thing can be said about mechanics. Even if every automobile is converted to electric or hydrogen or whatever, we’ll still need mechanics to fix them and perform maintenance on them. Things change, but they don’t go away. Now, the developers of the 90s who considered themselves to be HTML programmers have had to change dramatically because HTML programmers had a short run. Now, HTML is just a small portion of the skillset, and CSS radically changed HTML programming. Then Bootstrap, Material, and the other CSS frameworks radically changed it again. So the tools and how we use them will keep changing, but the need for people to design, implement, operate, and maintain software will still be there. But it’s an exciting time to be a programmer. – Right now, even if your current software wouldn’t benefit from the use of a GPT-3 model, you should add it to your toolbelt for you or your colleagues. For example, there are so many questions that we go to StackOverflow or web search. Or perhaps your users of some analytics database need help with a query. Now you have a new tool to help you draft query syntax. – Summary If you haven’t looked into GPT-3, you’ll want to. It’s a big leap ahead in the field of Machine Learning. And it’s capabilities can be a component that is a part of your Artificial Intelligence solution. Or just a part of an existing software system. I’d encourage you to read the research papers that describe it in more detail so you know how it’s designed. After all, it’s just software. You need to understand the capabilities of this new software component so you can choose how to use it to your benefit. There’s nothing magical about it. It’s just software, just like every other library and API service you currently use to do something.

I hope this has aided you in upleveling your understanding of GPT-3 and how to best use it in your own software.

Attention Is All You Need: https://arxiv.org/pdf/1706.03762.pdf
Improving Language Understanding by Generative Pre-Training: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
OpenAI API: https://openai.com/api/