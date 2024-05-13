Key Takeaways ChatGPT has a new model, GPT-4o, and it can integrate text, visuals, and audio.

More importantly, the new model allows correct text placement on generated images.

GPT-4o's updated capabilities could help with graphic design. It also doesn't require a paid subscription, and there's a desktop app.

ChatGPT can spit out text and, with the DALL-E integration, churn out images, but ask the artificial intelligence platform to combine the two and the result is typically an unreadable, jumbled mess. That’s changing, however, with the move to ChatGPT’s GPT-4o or Omni. While OpenAI’s demonstration on May 13 focused on using the end-to-end text, vision, and audio capabilities to have a real-time conversation, the update could bring key graphic design capabilities to ChatGPT. Early demos show the AI not just generating images that have legible, correctly-spelled text, but using an existing image of a person to replicate that face in the new image.

GPT-4o's approach to text, visuals, and audio

Everything is integrated into a single model

The key change coming with the launch of GPT-4o is the ability to both input and generate any mix of text, audio, and images. That’s because OpenAI trained a new model end-to-end that works across text, vision, and audio. Previously, GPT-4 would use separate models for audio, text, and images. With everything integrated into a single model, OpenAI explains that ChatGPT doesn’t lose information between models, which opens up a number of new possibilities.

While the live demo on May 13 focused on how that single end-to-end model allows you to use video to solve homework problems or have a real-time audio conversation, it also helps correct something the AI model is notoriously bad at: Placing text on an image. GPT-4 can attempt to place text, but it typically results in misspellings, even when you tell the chatbot exactly how to spell it.

ChatGPT was able to generate images with legible, correctly spelled text taken from the prompt.

In several samples of the upcoming GPT-4o’s capabilities, the AI was able to place writing on an image of a typewriter, create a graphic with a poem, and create a movie poster. In the demonstrations, the wording was given to the AI, with misspellings in generated text not explicitly spelled out. But ChatGPT was able to generate images with legible, correctly spelled text taken from the prompt.

OpenAI

You can use real faces in generated images

Imagine making a movie poster with actors' faces

In one demonstration, ChatGPT created a movie poster with the actors’ faces on it along with the correctly spelled text. This was made possible by uploading the photos of the actors and spelling out the text to include. While some AI platforms can create a new photo with a real person's face, ChatGPT wasn't previously able to create a photo that had much likeness to the original.

In another deomstration, the chatbot was able to place the OpenAI logo on an image. Another tasked the bot with creating a concrete poem where the word Omni appeared in the shape of the OpenAI logo.

The generated images in OpenAI’s demonstrations are not perfect -- when asked to take one correctly spelled poem image to dark mode, the software generates some misspellings. But the demonstration shows a much more legible, sensible result than the nonsensical way that GPT-4 generates text on images.

The software’s new capabilities in handling a mix of text-photos-speech also allow it to answer questions about a photo and extract text from images.

The demonstrations suggest ChatGPT could have more capabilities in graphic design with the launch of GPT-4o over the next few weeks. However, those capabilities could have some consequences. One of the easiest ways to tell if an image was generated by AI is to look at things like street signs or laptop screens where text appears jumbled. If AI learns to spell on images, that’s one less feature to signal the authenticity of an image floating around the web.

The end-to-end model integration text vision and audio also comes with faster speed, more features without a paid subscription, and a desktop app for Mac. OpenAI says that GPT-4o will roll out over the next few weeks.

FAQ

Q: When will GPT-4o be available and how much does it cost?

OpenAI's GPT-4o will start rolling out its text and image capabilities on May 13. It is free for all users, with paid users benefiting from up to five times the capacity limits.