New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks
仅事实

New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks

Summary

Researchers introduced ABC-Bench, a suite of tests that measures large language models' ability to conduct laboratory and bioinformatics tasks, finding AI agents outperform average human experts while highlighting potential dual-use concerns.

Scientists have released the Agentic Bio-Capabilities Benchmark (ABC-Bench) to assess how large language models (LLMs) handle laboratory and biosecurity-related tasks. The benchmark includes three types of assignments: generating code for liquid-handling robots, designing DNA fragments for in-vitro assembly, and devising methods to evade DNA synthesis screening. These tasks combine biological knowledge with software skills.

Testing multiple AI agents on the benchmark showed that each model surpassed the median performance of expert human baselines across all tasks. The models excelled on tasks that relied on established protocols and published information, but showed weaker results on a task requiring novel bioinformatics reasoning.

To validate the findings, the researchers conducted three wet-lab experiments. In one trial, OpenAI's o4-mini-high model produced a script that successfully directed an OpenTrons liquid-handling robot to assemble DNA with the expected sequences.

The results illustrate the growing capability of AI systems to perform complex biological functions, while also raising concerns about their potential misuse in dual-use applications.

FL Plus

用 FL Plus 读懂完整新闻

无限新闻,以及每条标题背后的分析。

无限新闻信息流
了解每条新闻的评分原因
完整的事实核查详情