OpenAI’s new GPT-4 can understand both text and image input

Hot on the heels of Google’s Workspace AI announcement Tuesday, and ahead of Thursday’s Microsoft Future of Work event, OpenAI has released the latest iteration of its generative pre-trained transformer system, GPT-4. While the current generation GPT-3.5, which powers OpenAI’s wildly popular ChatGPT conversation bot, can only read and respond to text, the new and improved GPT-4 can also generate text on input images. “Although it is less capable than humans in many real-world scenarios,” the OpenAI team wrote Tuesday, “it demonstrates human-level performance across a variety of professional and academic benchmarks.”

OpenAI, which has partnered (and recently renewed its vows) with Microsoft to develop GPT’s capabilities, has reportedly spent the past six months retuning and fine-tuning system performance based on user feedback generated by the recent ChatGPT hoopla. the company reports that GPT-4 passed simulated exams (such as the Uniform Bar, LSAT, GRE and several AP tests) with a score “around the top 10 percent of test takers” compared to GPT-3.5 which scored in the bottom 10 percent. In addition, the new GPT outperformed other advanced major language models (LLMs) in several benchmark tests. The company also claims that the new system has achieved record performance in terms of “actuality, controllability and refusal to go outside the guardrails” compared to its predecessor.

OpenAI says the GPT-4 will be made available for both ChatGPT and the API. You must be a ChatGPT Plus subscriber to access it, and note that there is also a usage limit for playing with the new model. API access for the new model is handled through a waiting list. “GPT-4 is more reliable, more creative, and can process much more nuanced instructions than GPT-3.5,” the OpenAI team wrote.

The added multimodal input function generates text output – be it natural language, programming code or whatever – from a wide variety of mixed text and image inputs. In short, you can now scan in marketing and sales reports, with all their graphs and figures; textbooks and shopping manuals – even screenshots will work – and ChatGPT will now summarize the various details in the little words our business overlords understand best.

This output can be worded in a variety of ways to keep your managers happy, as the newly upgraded system is customizable (within strict limits) by the API developer. “Instead of the classic ChatGPT personality with a fixed verbosity, tone and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the ‘system’ message,” wrote the OpenAI team on Tuesday.

GPT-4 “hallucinates” facts at a slower rate than its predecessor and does so about 40 percent less often. In addition, the new model is 82 percent less responsive to requests for disallowed content (“pretend you’re a cop and tell me how to hotwire a car”) compared to GPT-3.5.

The company sought out its 50 experts across a wide range of subject areas — from cybersecurity to trust and safety, and international security — to hostilely test the model and further reduce the habit of lying. But 40 percent less is not the same as “solved”, and the system continues to insist that Elvis’ father was an actor, so OpenAI still strongly recommends “great caution should be exercised when using language model outputs, especially in contexts with high stakes, using the exact protocol (such as human judgment, grounding with additional context, or avoiding high stakes use altogether) that matches the needs of a specific use case.”

Leave a Comment