New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks
Միայն փաստեր

New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks

Summary

Researchers introduced ABC-Bench, a suite of tests that measures large language models' ability to conduct laboratory and bioinformatics tasks, finding AI agents outperform average human experts while highlighting potential dual-use concerns.

Scientists have released the Agentic Bio-Capabilities Benchmark (ABC-Bench) to assess how large language models (LLMs) handle laboratory and biosecurity-related tasks. The benchmark includes three types of assignments: generating code for liquid-handling robots, designing DNA fragments for in-vitro assembly, and devising methods to evade DNA synthesis screening. These tasks combine biological knowledge with software skills.

Testing multiple AI agents on the benchmark showed that each model surpassed the median performance of expert human baselines across all tasks. The models excelled on tasks that relied on established protocols and published information, but showed weaker results on a task requiring novel bioinformatics reasoning.

To validate the findings, the researchers conducted three wet-lab experiments. In one trial, OpenAI's o4-mini-high model produced a script that successfully directed an OpenTrons liquid-handling robot to assemble DNA with the expected sequences.

The results illustrate the growing capability of AI systems to perform complex biological functions, while also raising concerns about their potential misuse in dual-use applications.

Աղբյուր

Sciencecast
FL Plus

Կարդացե՛ք ամբողջ նորությունը FL Plus-ով

Անսահմանափակ նորություններ և վերլուծություն յուրաքանչյուր վերնագրի հետևում։

Անսահմանափակ նորությունների հոսք
Ինչու՞ է նորությունն ստացել այս գնահատականը
Ֆակտչեքինգի ամբողջական մանրամասներ