GPT 4o Image Generation

Image creation, limited by imagination

This week, OpenAI announced 4o Image Gen.
It basically fixed every limitation that has plagued these tools for years, here are the key features:

  • Great at text: Finally, it reliably adds stylish text to your images.

  • Photo remixing: Easily create new images using other photos as prompts.

  • Complex scenes: Handles detailed scenes with lots of objects—up to 10 or 20 at once.

  • Consistent edits: Maintains accuracy even after multiple tweaks and refinements.

  • Real-world smarts: Understands your prompts deeply, pulling in real-world knowledge to make smarter images.

  • Simply better images: Overall sharper, more accurate, more realistic, and more context-aware.

The internet quickly demonstrated what this thing was capable of. In case you’re wondering how this new solution works architecturally, I’ll put it very simply:

Typical image generators (e.g., DALL-E) utilize diffusion models, beginning with random noise and iteratively removing noise to form an image—operating on the entire image at once.

In contrast, Images in ChatGPT employ an auto-regressive architecture, sequentially predicting one pixel (token) at a time, feeding each prediction back into the context for the next prediction. This pixel-by-pixel method excels at iterative refinement and stylization tasks.

Ok, enough writing. Lets have some fun. Fire up ChatGPT, make sure the 4o model is selected.

You will know if the new image gen feature is working based on how the image is rendered - if it is rendered slowly top-to-bottom like the following, you are good:

The implications of this are that you are able to be incredibly precise at the pixel level with things like text, stylization etc. The downside is you lose a little bit in terms of creativity of the model for generating net new images or ideas.

I’ve had a lot of fun with this over the past 24 hours. Let me take you on my journey of experimentation:

I always start with a man playing ping pong while skiing, but now it is incredibly easy to stylize:

Or to add highly precise text:

Which made me wonder what a linkedin profile of Honest Abe might look like:

Prompting can be complex:

Had to see what my wife’s IPO would look like:

And play with product placement using my friend Sweet Loren’s brand:

And try to begin writing a children’s book

There are significant copyright and misinformation opportunities that arise from the precision image editing capabilities that this presents. B