FactsLane: New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks

科学

2026年六月10日

Summary

Researchers introduced ABC-Bench, a suite of tests that measures large language models' ability to conduct laboratory and bioinformatics tasks, finding AI agents outperform average human experts while highlighting potential dual-use concerns.

Scientists have released the Agentic Bio-Capabilities Benchmark (ABC-Bench) to assess how large language models (LLMs) handle laboratory and biosecurity-related tasks. The benchmark includes three types of assignments: generating code for liquid-handling robots, designing DNA fragments for in-vitro assembly, and devising methods to evade DNA synthesis screening. These tasks combine biological knowledge with software skills.

Testing multiple AI agents on the benchmark showed that each model surpassed the median performance of expert human baselines across all tasks. The models excelled on tasks that relied on established protocols and published information, but showed weaker results on a task requiring novel bioinformatics reasoning.

To validate the findings, the researchers conducted three wet-lab experiments. In one trial, OpenAI's o4-mini-high model produced a script that successfully directed an OpenTrons liquid-handling robot to assemble DNA with the expected sequences.

The results illustrate the growing capability of AI systems to perform complex biological functions, while also raising concerns about their potential misuse in dual-use applications.

New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks

来源

首次报道于此

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

用 FL Plus 读懂完整新闻

用 FL Plus 读懂完整新闻

来源

首次报道于此

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

用 FL Plus 读懂完整新闻

用 FL Plus 读懂完整新闻

操纵原因

移除的情绪

政治倾向

可信度