First impressions of OpenAI o1: An AI designed to overthink it

Maxwell Zeff

Updated 13 September 2024 at 7:43 pm·7-min read

OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to "think" before they answer. There's been a lot of hype building up to these models, codenamed "Strawberry" inside OpenAI. But does Strawberry live up to the hype?

Sort of.

Compared to GPT-4o, the o1 models feel like one step forward and two steps back. OpenAI o1 excels at reasoning and answering complex questions, but the model is roughly four times more expensive to use than GPT-4o. OpenAI's latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive. In fact, OpenAI even admits that "GPT-4o is still the best option for most prompts" on its help page, and notes elsewhere that o1 struggles at simpler tasks.

"It's impressive, but I think the improvement is not very significant," said Ravid Shwartz Ziv, an NYU professor who studies AI models. "It's better at certain problems, but you don't have this across-the-board improvement."

For all of these reasons, it's important to use o1 only for the questions it's truly designed to help with: big ones. To be clear, most people are not using generative AI to answer these kinds of questions today, largely because today's AI models are not very good at it. However, o1 is a tentative step in that direction.

Thinking through big ideas

OpenAI o1 is unique because it "thinks" before answering, breaking down big problems into small steps and attempting to identify when it gets one of those steps right or wrong. This "multi-step reasoning" isn't entirely new (researchers have proposed it for years, and You.com uses it for complex queries), but it hasn't been practical until recently.

"There's a lot of excitement in the AI community," said Workera CEO and Stanford adjunct lecturer Kian Katanforoosh, who teaches classes on machine learning, in an interview. "If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking and allow the AI model to walk backwards from big ideas you're trying to work through."

OpenAI o1 is also uniquely pricey. In most models, you pay for input tokens and output tokens. However, o1 adds a hidden process (the small steps the model breaks big problems into), which adds a large amount of compute you never fully see. OpenAI is hiding some details of this process to maintain its competitive advantage. That said, you still get charged for these in the form of "reasoning tokens." This further emphasizes why you need to be careful about using OpenAI o1, so you don't get charged a ton of tokens for asking where the capital of Nevada is.

The idea of an AI model that helps you "walk backwards from big ideas" is powerful, though. In practice, the model is pretty good at that.

In one example, I asked ChatGPT o1 preview to help my family plan Thanksgiving, a task that could benefit from a little unbiased logic and reasoning. Specifically, I wanted help figuring out if two ovens would be sufficient to cook a Thanksgiving dinner for 11 people and wanted to talk through whether we should consider renting an Airbnb to get access to a third oven.

After 12 seconds of "thinking," ChatGPT wrote me out a 750+ word response ultimately telling me that two ovens should be sufficient with some careful strategizing, and will allow my family to save on costs and spend more time together. But it broke down its thinking for me at each step of the way and explained how it considered all of these external factors, including costs, family time, and oven management.

ChatGPT o1 preview told me how to prioritize oven space at the house that is hosting the event, which was smart. Oddly, it suggested I consider renting a portable oven for the day. That said, the model performed much better than GPT-4o, which required multiple follow-up questions about what exact dishes I was bringing, and then gave me bare-bones advice I found less useful.

Asking about Thanksgiving dinner may seem silly, but you could see how this tool would be helpful for breaking down complicated tasks.

I also asked o1 to help me plan out a busy day at work, where I needed to travel between the airport, multiple in-person meetings in various locations, and my office. It gave me a very detailed plan, but maybe was a little bit much. Sometimes, all the added steps can be a little overwhelming.

For a simpler question, o1 does way too much — it doesn't know when to stop overthinking. I asked where you can find cedar trees in America, and it delivered an 800+ word response, outlining every variation of cedar tree in the country, including their scientific name. It even had to consult with OpenAI's policies at some point, for some reason. GPT-4o did a much better job answering this question, delivering me about three sentences explaining you can find the trees all over the country.

Tempering expectations

In some ways, Strawberry was never going to live up to the hype. Reports about OpenAI's reasoning models date back to November 2023, right around the time everyone was looking for an answer about why OpenAI's board ousted Sam Altman. That spun up the rumor mill in the AI world, leaving some to speculate that Strawberry was a form of AGI, the enlightened version of AI that OpenAI aspires to ultimately create.

Altman confirmed o1 is not AGI to clear up any doubts, not that you'd be confused after using the thing. The CEO also trimmed expectations around this launch, tweeting that "o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it."

The rest of the AI world is coming to terms with a less exciting launch than expected.

"The hype sort of grew out of OpenAI's control," said Rohan Pandey, a research engineer with the AI startup ReWorkd, which builds web scrapers with OpenAI's models.

He's hoping that o1's reasoning ability is good enough to solve a niche set of complicated problems where GPT-4 falls short. That's likely how most people in the industry are viewing o1, but not quite as the revolutionary step forward that GPT-4 represented for the industry.

"Everybody is waiting for a step function change for capabilities, and it is unclear that this represents that. I think it's that simple," said Brightwave CEO Mike Conover, who previously co-created Databricks' AI model Dolly, in an interview.

What's the value here?

The underlying principles used to create o1 go back years. Google used similar techniques in 2016 to create AlphaGo, the first AI system to defeat a world champion of the board game Go, former Googler and CEO of the venture firm S32, Andy Harrison, points out. AlphaGo trained by playing against itself countless times, essentially self-teaching until it reached superhuman capability.

He notes that this brings up an age-old debate in the AI world.

"Camp one thinks that you can automate workflows through this agentic process. Camp two thinks that if you had generalized intelligence and reasoning, you wouldn't need the workflow and, like a human, the AI would just make a judgment," said Harrison in an interview.

Harrison says he's in camp one and that camp two requires you to trust AI to make the right decision. He doesn't think we're there yet.

However, others think of o1 as less of a decision-maker and more of a tool to question your thinking on big decisions.

Katanforoosh, the Workera CEO, described an example where he was going to interview a data scientist to work at his company. He tells OpenAI o1 that he only has 30 minutes and wants to asses a certain number of skills. He can work backward with the AI model to understand if he's thinking about this correctly, and o1 will understand time constraints and whatnot.

The question is whether this helpful tool is worth the hefty price tag. As AI models continue to get cheaper, o1 is one of the first AI models in a long time that we've seen get more expensive.

PA Media: Movies
James Bond autogyro cockpit cover among 007 film memorabilia up for auction
It was flown during1967’s You Only Live Twice.
PA Media: Movies
Nearly 2,000 fake Paddington Bear items seized in trading standards raids
Officials said there has been a sharp rise in illegal sales of counterfeit merchandise ahead of the release of Paddington In Peru on Friday.
PA Media: Movies
Paul Mescal says he values his family’s opinion the most ahead of Gladiator II
The Irish star portrays a grown-up Lucius in the Gladiator sequel, which is set years after the 2000 historical epic starring Russell Crowe.
Yahoo Movies UK
Every new Star Wars movie and TV show
With stories multiplying faster than a clone army, here’s every upcoming Star Wars show and movie that’s on the way.
Yahoo Movies UK
Paddington in Peru and the impossible task of topping Paddington 2
Paddington in Peru had a difficult challenge ahead of it even before filming began, how could it possibly live up to the dizzying heights of Paddington 2?
Yahoo Movies UK
Everything we know about vampire remake Nosferatu
Scheduled to hit the big screen on New Year's Day, here's what fans can expect from Roberts Eggers's new Gothic remake.
Yahoo Movies UK
Nosferatu called 'perfect' and 'horrifically brilliant' in first reactions
Bill Skarsgård and Lily-Rose Depp lead Robert Eggers' remake of the iconic 1922 vampire film, and it has been hailed by critics for its unique vision.
Yahoo Movies UK
Why isn’t Sally Hawkins in Paddington 3?
Emily Mortimer replaces Sally Hawkins as Mrs Brown in Paddington in Peru, the upcoming third film in the franchise.
PA Media: Movies
Nicholas Hoult: I found out before audition that Pattinson had won Batman role
The 34-year-old actor said missing out on the role was an ’emotional blow’.
PA Media: Movies
Ridley Scott teases plans for third Gladiator film inspired by The Godfather II
Gladiator II is set years after Sir Ridley’s historical epic released in 2000 starring Russell Crowe.
PA Media: Movies
Fire outside Mark Wahlberg’s new US restaurant caused by faulty fire pit
The restaurant had been evacuated before fire crews arrived, the Clark County Fire Department (CCFD) said.
Yahoo Movies UK
Gladiator 2's flooded Colosseum scene is causing a big historical debate
Did Roman gladiators really fight sharks? Ridley Scott thinks so and, ahead of the release of Gladiator 2, he's defending his movie's accuracy.
Yahoo Movies UK
From Gladiator 2 to Jaws, the most difficult productions in Hollywood history
From Mad Max to Jaws, the history of Hollywood is littered with productions much tougher than Ridley Scott's new Gladiator movie.
Yahoo TV UK
The key differences between The Day of the Jackal show and the 1973 film
Sky has remade The Day of the Jackal for TV, and the Eddie Redmayne series tweaks the book and 1973 Edward Fox film to appeal to modern audiences.
Yahoo Movies UK
Netflix horror Time Cut and the weirdest movie backlash of the year
The new Netflix slasher Time Cut is winning over horror fans. But there's a strange backlash over its 2003 setting and one clip in particular.
PA Media: Movies
Ariana Grande on Wicked role: I’ve been able to heal certain parts of myself
The US star said ‘I feel like I truly came home to myself in a lot of ways during the filming of this movie’.
Yahoo TV UK
Why is Stranger Things ending with season 5?
It’s almost time for Stranger Things fans to return to the Upside Down, but why won’t season 6 happen?
Yahoo Movies UK
'Godzilla is King of the Monsters, but Queen of the Gays'
Toho's iconic Kaiju has long appealed to the queer community, and now is the perfect time to celebrate everyone's favourite monster.
Yahoo Celebrity UK
What we know about Stranger Things season 5
Fans are piecing together what might happen to the citizens of Hawkins in Stranger Things season 5.
Yahoo Movies UK
Pedro Pascal’s fascinating road to Gladiator II
The beloved actor will soon enter the arena in Ridley Scott's follow-up to his 2000 masterpiece, but Pascal's career has taken him down some interesting roads.

Thinking through big ideas

Tempering expectations

What's the value here?

Latest stories