AI's Shortcomings in Scientific Tasks: Findings from LifeSciBench

OpenAI recently introduced LifeSciBench, a new benchmark designed to evaluate the practical utility of artificial intelligence in scientific endeavors, rather than merely its ability to answer biology questions. The results were quite shocking: even the most advanced model, GPT-Rosalind, which was specifically developed for this test, only manages to solve 36.1% of the tasks. In comparison, the newer model GPT-5.5 shows a success rate of 25.7%. This indicates that nearly two-thirds of real research tasks remain unresolved, even for the best AI models available today, underscoring the need for further advancements in this field.

Материал подготовлен AI-редакцией и проверен редактором.

AI and Scientific Challenges: Insights from LifeSciBench

Related articles