petri-labs

Five little worlds where lifelike, complex behavior emerges from simple rules. Open one and play: tweak the rules, watch what forms, and run real experiments.

origin

live

Watch tiny creatures evolve to see, hunt, flee, and split into new species, starting from random brains nobody programmed.

evolution neural networks speciation rust/wasm

swarm

live

Thousands of simple agents flock, split, and dodge predators. Order emerges from just three local rules.

flocking boids webgl

morph

live

Spots, stripes, mazes, and coral grow out of two reacting chemicals: Turing's patterns, forming live.

turing patterns reaction-diffusion webgl

market

live

Bubbles, crashes, and panics emerge from a crowd of simple trading bots. Watch a market boom and melt down.

trading agents order book volatility

social

live

Echo chambers, polarization, and sudden consensus form as opinions spread across a social network.

networks polarization information cascades

petri-bench — can your model do science?

full results →

A contamination-proof benchmark scoring the scientific method of LLM agents on these five worlds: hidden, procedurally generated parameter changes that must be found by controlled experiment — never recalled. 540 standardized episodes across six frontier models, every one auditable.

the reference sweep (informative test values)92.2solves 100%
best frontier models (glm-5.1 / claude-opus-4.8)61.7none above 50%
the same sweep, blind test values (ablation)~4533-37%
random guessing19.03%

Six models, three tiers, three episodes each, identical agent loop; every gap significance-tested. Models choose interventions better than chance, then lose the advantage through procedure — skipped measurements, missing factorials, p-hacks — all caught in the logs by the process audit.

What is petri-labs?

petri-labs is a collection of interactive simulations of emergent systems — worlds where complex, lifelike behavior arises from simple local rules rather than being programmed in. Each playground takes a different domain: artificial life and evolution in origin, collective motion in swarm, reaction-diffusion pattern formation in morph, price dynamics in market, and opinion dynamics in social.

Every simulation is real-time and runs entirely in your browser — no server, no install, no account. You can perturb parameters live, inspect individual agents, and watch how small changes cascade into different outcomes. origin runs a Rust core compiled to WebAssembly for near-native speed; the others use WebGL where the rendering demands it.

The Hypothesis Lab

The differentiator. Every playground supports statistically-tested experiments: state a hypothesis in plain language, and the lab designs a control/treatment comparison, runs replicate trials headless at max speed, and reports Wilcoxon signed-rank tests, Holm-Bonferroni correction, and Cliff's delta effect sizes. Results are measured, not anecdotal — petri-labs is meant to be an instrument, not a toy.

For agents (MCP)

The same experiment engine, exposed to AI agents over the Model Context Protocol. An agent connects, calls describe_model, then runs run_experiment on a control-vs-treatment contrast and gets back the same statistical report the in-browser Lab produces — replicated, with effect sizes and p-values, deterministically. No account, no browser.

Add it to any MCP client (Claude Desktop / Code, Cursor, and others):

{
  "mcpServers": {
    "petri-labs": { "command": "npx", "args": ["-y", "petri-labs-mcp"] }
  }
}

Published as petri-labs-mcp on npm; source.