New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks

Միայն փաստեր

Հունիս 10, 2026

New Benchmark Evaluates AI Performance on Biological Tasks and Biosecurity Risks

Summary

Researchers introduced ABC-Bench, a suite of tests that measures large language models' ability to conduct laboratory and bioinformatics tasks, finding AI agents outperform average human experts while highlighting potential dual-use concerns.

Scientists have released the Agentic Bio-Capabilities Benchmark (ABC-Bench) to assess how large language models (LLMs) handle laboratory and biosecurity-related tasks. The benchmark includes three types of assignments: generating code for liquid-handling robots, designing DNA fragments for in-vitro assembly, and devising methods to evade DNA synthesis screening. These tasks combine biological knowledge with software skills.

Testing multiple AI agents on the benchmark showed that each model surpassed the median performance of expert human baselines across all tasks. The models excelled on tasks that relied on established protocols and published information, but showed weaker results on a task requiring novel bioinformatics reasoning.

To validate the findings, the researchers conducted three wet-lab experiments. In one trial, OpenAI's o4-mini-high model produced a script that successfully directed an OpenTrons liquid-handling robot to assemble DNA with the expected sequences.

The results illustrate the growing capability of AI systems to perform complex biological functions, while also raising concerns about their potential misuse in dual-use applications.

Աղբյուր

Sciencecast

Առաջին անգամ հաղորդվել է այստեղ

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

Փետրվար 28, 2026 • Sciencecast

Աղբյուրի գնահատականներ

Ինչպես են աշխատում այս չափորոշիչները

Կարևորություն

85%

Հետաքրքրություն

75%

Հավաստիություն

90%

Մանիպուլյացիաներ

10%

Քաղ. կողմն.

Կենտրոն

Հեռացված էմոցիաներ

Կարդացե՛ք ամբողջ նորությունը FL Plus-ով

Անսահմանափակ նորություններ և վերլուծություն յուրաքանչյուր վերնագրի հետևում։

Անսահմանափակ նորությունների հոսք

Ինչու՞ է նորությունն ստացել այս գնահատականը

Ֆակտչեքինգի ամբողջական մանրամասներ

Սկսել անվճար փորձաշրջանը

Կարդացե՛ք ամբողջ նորությունը FL Plus-ով

Անսահմանափակ նորություններ և վերլուծություն յուրաքանչյուր վերնագրի հետևում։

Սկսել անվճար փորձաշրջանը

Անսահմանափակ նորությունների հոսք

Ինչու՞ է նորությունն ստացել այս գնահատականը

Ֆակտչեքինգի ամբողջական մանրամասներ

Աղբյուր

Առաջին անգամ հաղորդվել է այստեղ

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

Կարդացե՛ք ամբողջ նորությունը FL Plus-ով

Կարդացե՛ք ամբողջ նորությունը FL Plus-ով

Մանիպուլյացիաների պատճառներ

Հեռացված էմոցիաներ

Քաղ. կողմն.

Հավաստիություն