Factorio as an AI Benchmark

10th Feb 2026
3 min read
Tags:
ai,
llm,
rust,
factorio,
agents

The best AI models today can barely automate early-game smelting in Factorio.

Meanwhile, experienced human players build megabases processing millions of items per minute — with perfectly ratioed production lines, optimized train networks, and nuclear power grids running at scale.

That gap is interesting.

Why Factorio Is Hard for AI

Factorio is not just a game. It is a real-time systems optimization problem.

It requires:

Long-horizon planning
Resource dependency tracking
Spatial reasoning
Throughput optimization
Incremental refactoring of live systems

The game punishes short-term thinking.
Every early design decision propagates downstream.

This makes it fundamentally different from most existing AI benchmarks, which are:

Static
Text-only
Short-horizon
Deterministic

Factorio is dynamic, persistent, and adversarial to naive planning.

The Factorio Learning Environment

There is already an open-source research effort exploring this space:

Factorio Learning Environment
Published at NeurIPS 2025.

It exposes a live Factorio server to LLM agents.
Agents write Python code to interact with the environment and attempt to build automated factories.

The results so far highlight how difficult the problem really is.

Even strong language models struggle with:

Maintaining state over long sessions
Correctly sequencing multi-step production chains
Recovering from partial failures
Designing scalable layouts

This is not a prompt engineering problem.
It is a systems reasoning problem.

Why This Is an Interesting Benchmark

Factorio introduces properties that resemble real-world infrastructure engineering:

Graph-based dependency trees
Constrained resource allocation
Throughput bottlenecks
Distributed logistics (trains, belts, bots)
Continuous optimization under growth

It is closer to distributed systems design than to puzzle solving.

That makes it a compelling unsaturated benchmark for autonomous agents.

A Rust-Based Approach

I’m currently experimenting with a Rust rewrite of the agent layer using Rig.

The direction is deliberate.

1. Typed Tools

Every game action becomes a strongly typed tool:

Place entity
Connect belts
Query inventory
Inspect recipes
Read map state

The domain is highly structured.
Rust’s type system allows encoding that structure directly into the interface.

2. Multi-Turn Agent Loops over RCON

Instead of single-shot execution, the agent operates in iterative loops:

Observe world state
Plan next action
Execute via RCON
Re-evaluate

This creates a feedback-driven control system rather than a stateless command generator.

3. RAG over the Recipe Graph

Factorio’s crafting tree is a dependency graph.

Using retrieval over:

The recipe tree
Wiki documentation
Item production chains

allows grounding decisions in structured domain knowledge instead of relying purely on model memory.

Why Rust Fits

Factorio is deterministic and rule-based.

The action space is structured. The state transitions are explicit. The constraints are mechanical.

Rust feels like a natural fit for:

Modeling state transitions
Enforcing invariants
Building typed agent tooling
Keeping orchestration predictable

When the domain itself is a graph of dependencies, types become leverage.

The Gap

Humans build megabases.

AI struggles to build a stable smelting line.

That gap is not just amusing — it’s informative.

It exposes the limits of current reasoning systems when faced with:

Long-horizon planning
Structural optimization
Persistent world interaction

Factorio may quietly become one of the most revealing AI benchmarks available.

The factory must grow — for both humans and AI.