Factorio as an AI Benchmark

The best AI models today can barely automate early-game smelting in Factorio.

Meanwhile, experienced human players build megabases processing millions of items per minute — with perfectly ratioed production lines, optimized train networks, and nuclear power grids running at scale.

That gap is interesting.

Why Factorio Is Hard for AI

Factorio is not just a game. It is a real-time systems optimization problem.

It requires:

  • Long-horizon planning
  • Resource dependency tracking
  • Spatial reasoning
  • Throughput optimization
  • Incremental refactoring of live systems

The game punishes short-term thinking.
Every early design decision propagates downstream.

This makes it fundamentally different from most existing AI benchmarks, which are:

  • Static
  • Text-only
  • Short-horizon
  • Deterministic

Factorio is dynamic, persistent, and adversarial to naive planning.

The Factorio Learning Environment

There is already an open-source research effort exploring this space:

Factorio Learning Environment
Published at NeurIPS 2025.

It exposes a live Factorio server to LLM agents.
Agents write Python code to interact with the environment and attempt to build automated factories.

The results so far highlight how difficult the problem really is.

Even strong language models struggle with:

  • Maintaining state over long sessions
  • Correctly sequencing multi-step production chains
  • Recovering from partial failures
  • Designing scalable layouts

This is not a prompt engineering problem.
It is a systems reasoning problem.

Why This Is an Interesting Benchmark

Factorio introduces properties that resemble real-world infrastructure engineering:

  • Graph-based dependency trees
  • Constrained resource allocation
  • Throughput bottlenecks
  • Distributed logistics (trains, belts, bots)
  • Continuous optimization under growth

It is closer to distributed systems design than to puzzle solving.

That makes it a compelling unsaturated benchmark for autonomous agents.

A Rust-Based Approach

I’m currently experimenting with a Rust rewrite of the agent layer using Rig.

The direction is deliberate.

1. Typed Tools

Every game action becomes a strongly typed tool:

  • Place entity
  • Connect belts
  • Query inventory
  • Inspect recipes
  • Read map state

The domain is highly structured.
Rust’s type system allows encoding that structure directly into the interface.

2. Multi-Turn Agent Loops over RCON

Instead of single-shot execution, the agent operates in iterative loops:

  • Observe world state
  • Plan next action
  • Execute via RCON
  • Re-evaluate

This creates a feedback-driven control system rather than a stateless command generator.

3. RAG over the Recipe Graph

Factorio’s crafting tree is a dependency graph.

Using retrieval over:

  • The recipe tree
  • Wiki documentation
  • Item production chains

allows grounding decisions in structured domain knowledge instead of relying purely on model memory.

Why Rust Fits

Factorio is deterministic and rule-based.

The action space is structured. The state transitions are explicit. The constraints are mechanical.

Rust feels like a natural fit for:

  • Modeling state transitions
  • Enforcing invariants
  • Building typed agent tooling
  • Keeping orchestration predictable

When the domain itself is a graph of dependencies, types become leverage.

The Gap

Humans build megabases.

AI struggles to build a stable smelting line.

That gap is not just amusing — it’s informative.

It exposes the limits of current reasoning systems when faced with:

  • Long-horizon planning
  • Structural optimization
  • Persistent world interaction

Factorio may quietly become one of the most revealing AI benchmarks available.

The factory must grow — for both humans and AI.