Science

ChatGPT Surpasses Gemini in Key AI Benchmark Tests

Published

6 months ago

18 January, 2026

The competition between AI systems is intensifying, with recent benchmark tests revealing that ChatGPT outperforms Gemini in several critical areas of artificial intelligence. As these two platforms vie for dominance, understanding their strengths can help users make informed decisions about which system to employ.

The landscape of AI tools is vast, yet most users are familiar with only a few prominent names, such as OpenAI’s ChatGPT and Google’s Gemini. With the rapid evolution in AI capabilities, ensuring a fair comparison becomes increasingly complex. In December 2025, speculation arose regarding OpenAI’s competitive position, but the release of ChatGPT-5.2 quickly shifted the narrative, marking a significant comeback.

To evaluate these AI systems, a number of benchmarks focus on different dimensions of performance, including reasoning, problem-solving, and abstract thinking. Three key benchmarks where ChatGPT excels include GPQA Diamond, SWE-Bench Pro, and ARC-AGI-2.

Performance in Advanced Reasoning

The GPQA Diamond benchmark tests advanced reasoning in subjects like physics, chemistry, and biology. This assessment, which stands for Google-Proof Questions and Answers, features complex questions that require deep understanding rather than simple recall.

ChatGPT-5.2 scored 92.4% on this scale, slightly ahead of Gemini 3 Pro, which achieved 91.9%. For context, a typical PhD graduate would score around 65%, while non-expert individuals average only 34%. The ability to solve such intricate problems indicates ChatGPT’s proficiency in applying scientific concepts effectively, a skill that is crucial in today’s data-driven world.

Software Engineering Challenges

In the realm of software engineering, the SWE-Bench Pro (Private Dataset) benchmark evaluates an AI’s ability to address real-world coding issues sourced from the GitHub platform. This variant assesses how well an AI can interpret a bug report and produce a viable solution.

Here, ChatGPT-5.2 demonstrated its capabilities by resolving approximately 24% of the presented issues, while Gemini managed to fix around 18%. Although these figures might seem low, they reflect the complexity of the challenges involved. For comparison, human engineers successfully solve 100% of the issues in similar tasks.

Abstract Reasoning and Problem Solving

The ARC-AGI-2 benchmark, updated in March 2025, measures an AI’s ability to apply abstract reasoning to unfamiliar problems. This task is designed to gauge how well AI can identify patterns and apply them to new situations.

In this test, ChatGPT-5.2 Pro achieved a score of 54.2%, while Gemini’s various models scored between 31.1% and 54%. These results suggest that while Gemini has strengths in certain areas, it consistently falls short of ChatGPT in this specific measure of intelligence.

AI benchmarks are constantly evolving, and the numbers mentioned are likely to shift with future updates from OpenAI and Google. The focus here was on the most current versions of each system, specifically the Pro versions, allowing for a direct comparison of their capabilities.

While Gemini may outperform ChatGPT in certain benchmarks, such as SWE-Bench Bash Only and Humanity’s Last Exam, the three benchmarks highlighted here demonstrate a significant edge for ChatGPT in areas of knowledge application, problem-solving, and abstract reasoning.

In conclusion, the ongoing rivalry between ChatGPT and Gemini continues to shape the AI landscape. As these technologies advance, users will benefit from understanding the strengths and weaknesses of each system, ultimately leading to better outcomes in various applications.

Related Topics:

Up Next

Researchers at TU Wien Unveil New Model for Mapping Opinions

Don't Miss

New Moon Arrives on January 18, Heralding Lunar Cycle Change

Editorial

Our Editorial team doesn’t just report the news—we live it. Backed by years of frontline experience, we hunt down the facts, verify them to the letter, and deliver the stories that shape our world. Fueled by integrity and a keen eye for nuance, we tackle politics, culture, and technology with incisive analysis. When the headlines change by the minute, you can count on us to cut through the noise and serve you clarity on a silver platter.