The most popular AI technology base, OpenAI’s GPT, received a major upgrade on Tuesday that is now available in the premium version of the ChatGPT chatbot.
The new GPT-4 can generate much longer text strings and respond when people input images, and it’s designed to better avoid the artificial intelligence pitfalls visible in the earlier GPT-3.5, OpenAI said Tuesday. For example, in bar exams that lawyers must take to practice law, GPT-4 ranks in the top 10% of scores compared to the bottom 10% for GPT-3.5, according to the AI research firm.
GPT stands for Generative Pretrained Transformer, a reference to the fact that it can generate text itself and that it uses an AI technology called transformers that Google pioneered. It’s a type of AI called a large language model, or LLM, that’s trained on huge amounts of data collected from the internet, learning mathematically to recognize patterns and reproduce styles.
OpenAI has made its GPT technology available to developers for years, but ChatGPT, which debuted in November, offered an easy interface that spawned an explosion of interest, experimentation, and concerns about the technology’s drawbacks. ChatGPT is free, but stutters when it’s in high demand. In January, OpenAI started offering ChatGPT Plus for $20 per month with guaranteed availability and now the GPT-4 base.
GPT-4 improvements
“In casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference emerges when the complexity of the task reaches a sufficient threshold,” said OpenAI. “GPT-4 is more reliable, more creative and can process much more nuanced instructions than GPT-3.5.”
Another major advancement in GPT-4 is the ability to accept input data, including text and photos. In OpenAI’s example, the chatbot is asked to explain a joke that shows a bulky, decades-old computer cable plugged into the tiny Lightning port of a modern iPhone.
Another is better performance in avoiding AI problems like hallucinations – mismanufactured responses, often presented with as much seeming authority as answers the AI gets right. GPT-4 is also better at thwarting attempts to get it to say the wrong thing: “GPT-4 scores 40% higher than our latest GPT-3.5 on our internal contradictory factuality evaluations,” according to OpenAI.
GPT-4 also adds new “steerability” options. Users of large language models today often have to do extensive prompt engineering, learning how to embed specific clues into their prompts to get the right kind of responses. GPT-4 adds a system command option that allows users to set a specific tone or style, for example programming code or a Socratic tutor: “You are a tutor who always responds in the Socratic style. You never give the student the answer, but always try to do exactly the ask the right question to help them learn to think for themselves.”
“Stochastic parrots” and other problems
OpenAI acknowledges significant shortcomings that persist with GPT-4, though it also encourages avoidance of them.
“It can sometimes make simple errors of reasoning… or be overly gullible in accepting blatant false statements from a user. And sometimes it can fail on difficult problems in the same way humans do, such as introducing security vulnerabilities into the code it produces ,” OpenAI said. In addition, “GPT-4 can also be confidently wrong in its predictions, not taking care to double check its work when it is likely to make a mistake.”
Large language models can produce impressive results, seeming to understand vast amounts of topics and converse in human-sounding if somewhat stilted language. But essentially, LLM AIs don’t really know anything. They are just able to string words together in very statistically sophisticated ways.
This statistical but fundamentally somewhat hollow approach to knowledge led researchers, including former Google AI researchers Emily Bender and Timnit Gebru, to warn of the “dangers of stochastic parroting” associated with large language models. Language model AIs tend to encode biases, stereotypes, and negative sentiment into training data, and researchers and other people using these models tend to “…confuse performance gains with actual natural language understanding”.
OpenAI, Microsoft and Nvidia partnership
OpenAI got a big boost when Microsoft said in February that it uses GPT technology in its Bing search engine, including chat features similar to ChatGPT. On Tuesday, Microsoft said it uses GPT-4 for its Bing work. Together, OpenAI and Microsoft form one major search threat for Googlebut Google also has its own technology for large language models, including one chatbot named Bard which Google is privately testing.
Microsoft uses GPT technology both to evaluate the queries people type into Bing and, in some cases, to provide more comprehensive conversational responses. The results can be much more informative than those of previous search engines, but the more conversational interface that can be called as an option has had issues that make it look unhinged.
To train GPT, OpenAI used Microsoft’s Azure cloud computing service, including thousands of Nvidia’s A100 graphics processing units, or GPUs, rolled together. Azure can now use Nvidia’s new H100 processors, which include specific circuitry to speed up AI transformer computations.