ChatGPT Introduced "Deep Research" Work with 26.6 Accuracy

Deep research is OpenAI’s next agent that can do work for you independently—you give it a prompt. Chatgpt will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst.

Powered by a version of the upcoming OpenAI o3 model that’s optimized for web browsing and data analysis, it leverages reasoning to search, interpret, and analyze massive amounts of text, images, and PDFs on the internet, pivoting as needed in reaction to information it encounters.

The ability to synthesize knowledge is a prerequisite for creating new knowledge. For this reason, deep research marks a significant step toward our broader goal of developing AGI, which we have long envisioned as capable of producing novel scientific research.

The tool offers users a text-only output for now. Still, OpenAI plans to incorporate additional features such as data visualizations, embedded images, and other forms of analysis in the future.

Additionally, OpenAI is working on expanding the tool’s capabilities to include more specialized data sources, including subscription-based services and internal company resources.

In terms of accuracy, OpenAI has incorporated its recently launched o3 AI model, which was specifically trained for tasks like web browsing and data analysis. This model employs reinforcement learning, which improves the AI’s reasoning over time by rewarding it for making progress toward a specific goal.

Deep research is thus optimised for searching, interpreting, and analysing large amounts of online text, images, and PDFs, even adjusting its analysis as it processes new information.

While OpenAI’s new research tool shows promise, the company openly admits it isn’t perfect. Like any AI system, it might sometimes get things wrong – especially when telling trusted sources from questionable ones.

Read: What is DeepSeek? Chinese AI Chatbot Beat ChatGPT, Gemini

To help users check facts, every answer from this tool will include sources and a plain English explanation of how the AI reached its conclusions. In recent tests, this tool (called “Deep Research”) scored 26.6% on a tough knowledge exam – beating competitors like Gemini Thinking and even OpenAI’s own GPT-4. However, that still means it got over 70% of answers wrong in these tests.

OpenAI warns users to stay cautious. The tool might slip up when connecting ideas or formatting references. It could also fail to mention when it’s unsure about information. The company stresses this is still experimental technology – helpful for research but not foolproof.

Why Open AI Built Deep Research

ChatGPT Introduced Deep Research Work with 26.6 Accuracy

Deep research is built for people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research. It can be equally useful for discerning shoppers looking for hyper-personalized recommendations on purchases that typically require careful research, like cars, appliances, and furniture.

Every output is fully documented, with clear citations and a summary of its thinking, making it easy to reference and verify the information. It is particularly effective at finding niche, non-intuitive information that would require browsing numerous websites. Deep research frees up valuable time by allowing you to offload and expedite complex, time-intensive web research with just one query.

Deep research independently discovers, reasons about, and consolidates insights from across the web. To accomplish this, it was trained on real-world tasks requiring browser and Python tool use, using the same reinforcement learning methods behind OpenAI o1, our first reasoning model.

While o1 demonstrates impressive capabilities in coding, math, and other technical domains, many real-world challenges demand extensive context and information gathering from diverse online sources. Deep research builds on these reasoning capabilities to bridge that gap, allowing it to take on the types of problems people face in work and everyday life.

How to use deep research

In ChatGPT, select ‘deep research’ in the message composer and enter your query. Tell ChatGPT what you need—whether it’s a competitive analysis on streaming platforms or a personalized report on the best commuter bike. You can attach files or spreadsheets to add context to your question. Once it starts running, a sidebar appears with a summary of the steps taken and sources used.

Deep research may take anywhere from 5 to 30 minutes to complete its work, taking the time needed to dive deep into the web. In the meantime, you can step away or work on other tasks—you’ll get a notification once the research is complete.

The final output arrives as a report within the chat – in the next few weeks, we will also be adding embedded images, data visualizations, and other analytic outputs in these reports for additional clarity and context.

Compared to deep research, GPT-4o is ideal for real-time, multimodal conversations. For multi-faceted, domain-specific inquiries where depth and detail are critical, deep research’s ability to conduct extensive exploration and cite each claim is the difference between a quick summary and a well-documented, verified answer that can be usable as a work product.

How it works

Deep research was trained using end-to-end reinforcement learning on hard browsing and reasoning tasks across a range of domains. Through that training, it learned to plan and execute a multi-step trajectory to find the data it needs, backtracking and reacting to real-time information where necessary.

The model is also able to browse over user-uploaded files, plot and iterate on graphs using the Python tool, embed both generated graphs and images from websites in its responses, and cite specific sentences or passages from its sources. As a result of this training, it reached new highs on many public evaluations focused on real-world problems.

On Humanity’s Last Exam⁠ a recently released evaluation that tests AI across a broad range of subjects on expert-level questions, the model powering deep research scores a new high at 26.6% accuracy. This test consists of over 3,000 multiple choice and short answer questions across more than 100 subjects from linguistics to rocket science, classics to ecology.

Compared to OpenAI o1, the largest gains appeared in chemistry, humanities and social sciences, and mathematics. The model powering deep research showcased a human-like approach by effectively seeking out specialized information when necessary.

Model	Accuracy (%)
GPT-4o	3.3
Grok-2	3.8
Claude 3.5 Sonnet	4.3
Gemini Thinking	6.2
OpenAI o1	9.1
DeepSeek-R1*	9.4
OpenAI o3-mini (medium)*	10.5
OpenAI o3-mini (high)*	13.0
OpenAI deep research**	26.6

Limitations

Deep research unlocks significant new capabilities, but it’s still early and has limitations. It can sometimes hallucinate facts in responses or make incorrect inferences, though at a notably lower rate than existing ChatGPT models, according to internal evaluations.

It may struggle with distinguishing authoritative information from rumours, and currently shows weakness in confidence calibration, often failing to convey uncertainty accurately.

At launch, there may be minor formatting errors in reports and citations, and tasks may take longer to kick off. We expect all these issues to quickly improve with more usage and time.

It’s Access

Deep research in ChatGPT is currently very compute-intensive. The longer it takes to research a query, the more inference computing is required. We are starting with a version optimized for Pro users today, with up to 100 queries per month.

Plus Team users will get access next, followed by Enterprise. We are still working on bringing access to users in the United Kingdom, Switzerland, and the European Economic Area.

All paid users will soon get significantly higher rate limits when we release a faster, more cost-effective version of deep research powered by a smaller model that still provides high-quality results.

In the coming weeks and months, we’ll be working on the technical infrastructure, closely monitoring the current release, and conducting even more rigorous testing. This aligns with our principle of iterative deployment.

If all safety checks continue to meet our release standards, we anticipate releasing deep research to Plus users in about a month.

Upcoming Updates

Deep research is available today on ChatGPT web and will be rolled out to mobile and desktop apps within the month. Currently, deep research can access the open web and any uploaded files. In the future, you’ll be able to connect to more specialized data sources—expanding its access to subscription-based or internal resources—to make its output even more robust and personalized.

Looking further ahead, we envision agentic experiences coming together in ChatGPT for asynchronous, real-world research and execution. The combination of deep research, which can perform asynchronous online investigation, and Operator, which can take real-world action, will enable ChatGPT to carry out increasingly sophisticated tasks for you.

ChatGPT Introduced “Deep Research” Work with 26.6 Accuracy