OpenAI o3 vs. Google Gemini 2.0: Which Model Is Closer to Artificial General Intelligence?

2025 is shaping up to be a pivotal moment in A.I. innovation, driven by the race among tech giants to build artificial general intelligence (A.G.I.), or A.I. that can reach human-level intelligence. Recently, OpenAI and Google unveiled their respective new A.I. models: o3 and Gemini 2.0. OpenAI’s o3, announced on Dec. 20, is a reasoning model that CEO Sam Altman claims might achieve A.G.I. once it clears safety tests, while Google CEO Sundar Pichai touted Gemini 2.0 as the company’s “most thoughtful model yet.Both models demonstrate significant A.G.I. capabilities, though their approaches differ. While OpenAI’s new model focuses on cognitive abilities, Google positions Gemini 2.0 as a “highly integrated agentic A.I. tool” designed for efficiency and real-time problem-solving.

OpenAI’s o3 focuses on high-level reasoning, using a “private chain of thought” to solve problems. This approach allows it to perform well in physics, mathematics and science-related reasoning. It has shown impressive results on the ARC-AGI test—a benchmark for assessing an A.I. model’s ability to learn new skills outside of its training data. The o3 model scored 87.5 percent and 75.7 percent on the high compute setting and the low compute setting, respectively, tripling the performance of its predecessor, o1. (OpenAI reportedly avoided naming the model “o2” due to trademark conflicts with the British telecom company O2.)

The breakthrough is expensive, though. It currently costs OpenAI $20 per task for the low-compute mode and thousands of dollars for the high-compute mode. “These capabilities are new territory, and they demand serious scientific attention,” François Chollet, co-creator of the ARC-AGI benchmark. It will be interesting to see how OpenAI sets pricing for o3 subscriptions, especially since Altman said the company is losing money on OpenAI Pro subscriptions because of high usage costs.  

Gemini 2.0’s strength lies in multimodal capabilities, such as the ability to process audio. Its “Thinking Mode” is a standout feature, which boosts reasoning and provides step-by-step explanations. Gemini 2.0 also supports the ability to create combined outputs—like a blog post featuring text, A.I.-generated visuals and multilingual text-to-speech audio—with a single prompt. Users can also fine-tune the audio’s tone and style. 

Experts remain divided on whether these advancements signal real progress toward A.G.I. “We’ve certainly made progress toward A.G.I., but I think it is still a fair distance away, and some of the buzz is marketing hype,” Thomas Malone, director of the MIT Center for Collective Intelligence, told Observer. “Benchmarks are an innovative way to measure A.I. capabilities, but they don’t capture all forms of human intelligence.”

Chollet expressed concerns that OpenAI’s o3 may not yet possess the kind of “generalized” intelligence that A.G.I. requires. “I don’t think o3 is A.G.I. yet,” he wrote in a blog post. He pointed out that the upcoming ARC-AGI-2 benchmark might still present a significant challenge for o3, potentially lowering its performance under high-compute conditions.

“One major technical hurdle in A.I.’s progress toward AGI is long-term memory, which allows the model to retain full context for every action it takes. Latency and cost are also challenges, but those will likely improve quickly—these are just the first generation,” Will Bryk, CEO of Exa, a company building web search infrastructure for A.I. chatbots, told Observer. “The best definition of AGI is when it can automate a significant portion of the knowledge economy. We’re not there yet, but getting closer to AGI.”