Training environments that compound.

930 programmatically generates training environments for computer-use models — tasks, diagnostic grading, and reward signals — from natural language. Every failure feeds the next generation of training data. The universe expands where models need it most.

Get in touch Read the manifesto

What we believe

The ceiling on agent capabilities is the training environment.

Not compute. Not architecture. The quality and diversity of the environments your model trains in determine what it can learn. Today’s eval and training pipelines are hand-crafted, brittle, and siloed — every team rebuilds the same environments, and failures are discarded instead of learned from.

We’re building infrastructure where environments are programs, grading is diagnostic, trajectories are first-class data, and the training universe improves itself.

The loop

A training universe that expands with every user.

930 generates environments from natural language, so it can also analyze your rollouts, identify your model’s blind spots, and generate new tasks targeting exactly those weaknesses. Do that for every team on the platform, and the universe grows — especially where models need it most.

Evaluate

Run tasks with criterion-level grading. See exactly where your model fails and why.

Analyze blind spots

Analyze rollouts across sessions. Surface strengths, weaknesses, and missing capabilities.

Generate targeted tasks

Create new environments and tasks in the model’s blind spots. Curriculum that adapts to what’s actually broken.

Train & repeat

Export training data. Fine-tune. Re-evaluate. The loop compounds. The universe expands.

Technical bets

Four bets on how agent training should work.

Each shapes the platform — and each is a hypothesis we’re testing in production.

Environments as programs, not prompts

Generated from natural language, compiled to executable code, validated at runtime. Not templates — real state machines with event handlers, world generators, and grading logic. Every environment is deterministic, forkable, and composable.

Diagnostic grading, not scalar scores

Each task is graded by composable rubric functions. Every criterion returns a score and detailed textual feedback — “cell (row_1, amount) = 450, expected 500”, not just “73%”. The same results serve as RL reward signals with per-criterion auxiliary rewards.

Trajectories as first-class data

Every agent action is stored with full state snapshots. Replay any moment, fork from any failure, seed for deterministic reproduction. One run produces eval grades, RL reward signals, and SFT training data — no separate pipelines.

A shared universe that compounds

Every team’s training generates environments that benefit everyone. Failures surface blind spots, blind spots generate new tasks, new tasks grow the universe. The harder the frontier, the faster the coverage expands.

See it in action

Models training in real environments.

Criterion-level grading across CRM workflows, spreadsheet reasoning, spatial manipulation, and more.

Cross-app workflow

Model onboards a client across CRM, spreadsheet, and task list in one run.

Find the right contact, record the contract in revenue, and capture follow-ups in todos — each step scored on the real UI state, not a single pass/fail.

Example prompt

Onboard Quigley-Block as a new client with a $126000 contract. Update their CRM status and deal value, add the contract as revenue in Excel, and create three onboarding …

13 steps Explore catalog

Spreadsheet reasoning

Model reconciles multiple sheets without corrupting source data.

Cross-reference tabs, infer what is missing or inconsistent, and write only the derived output. Graded on correctness and on leaving originals intact.

Example prompt

Reconcile invoices against payments: 1. Switch to the 'Invoices' sheet and review all invoice IDs 2. Switch to the 'Payments' sheet and note which invoice IDs have payme…

15 steps Explore catalog

Spatial layout

Model arranges furniture under explicit spatial constraints.

A floor-plan editor with movable pieces: satisfy relationships like “sofa opposite windows” and “lamp symmetry” while avoiding collisions — precise geometric grading.

Example prompt

Arrange the living room: place the sofa against the wall opposite the windows, put the coffee table in front of the sofa, place a lamp on each side of the sofa, move the…

12 steps Explore catalog

We’re building the training infrastructure for computer-use models.

If you’re working on agent capabilities, model evaluation, or synthetic training data — we should talk. We’re looking for research partners who want to push the frontier of what agents can learn.

Get in touch Read the manifesto