Building an AI Coach That Sees Your Factorio Factory

Nikolai Shelekhov — Sun, 15 Feb 2026 00:00:00 +0000

</p>

I got tired of alt-tabbing to ask Claude about my Factorio factory.</p>

Every time I wanted advice, I had to describe my setup manually. “I have 4 furnaces, 2 steam engines, researching automation…” — and half the time I’d forget something important or get the numbers wrong.</p>

So I built a tool that lets Claude see the game directly.</p>

Factorio as an AI Benchmark

Nikolai Shelekhov — Tue, 10 Feb 2026 00:00:00 +0000

The best AI models today can barely automate early-game smelting in Factorio.</p>

Meanwhile, experienced human players build megabases processing millions of items per minute — with perfectly ratioed production lines, optimized train networks, and nuclear power grids running at scale.</p>

That gap is interesting.</p>

Why Factorio Is Hard for AI</h2>
Factorio is not just a game. It is a real-time systems optimization problem.</p>
It requires:</p>

Long-horizon planning</li>
Resource dependency tracking</li>
Spatial reasoning</li>
Throughput optimization</li>
Incremental refactoring of live systems</li> </ul>
The game punishes short-term thinking.
Every early design decision propagates downstream.</p>
This makes it fundamentally different from most existing AI benchmarks, which are:</p>

Static</li>
Text-only</li>
Short-horizon</li>
Deterministic</li> </ul>
Factorio is dynamic, persistent, and adversarial to naive planning.</p>
The Factorio Learning Environment</h2>
There is already an open-source research effort exploring this space:</p>
Factorio Learning Environment
Published at NeurIPS 2025.</p>
It exposes a live Factorio server to LLM agents.
Agents write Python code to interact with the environment and attempt to build automated factories.</p>
The results so far highlight how difficult the problem really is.</p>
Even strong language models struggle with:</p>

Maintaining state over long sessions</li>
Correctly sequencing multi-step production chains</li>
Recovering from partial failures</li>
Designing scalable layouts</li> </ul>
This is not a prompt engineering problem.
It is a systems reasoning problem.</p>
Why This Is an Interesting Benchmark</h2>
Factorio introduces properties that resemble real-world infrastructure engineering:</p>

Graph-based dependency trees</li>
Constrained resource allocation</li>
Throughput bottlenecks</li>
Distributed logistics (trains, belts, bots)</li>
Continuous optimization under growth</li> </ul>
It is closer to distributed systems design than to puzzle solving.</p>
That makes it a compelling unsaturated benchmark for autonomous agents.</p>
A Rust-Based Approach</h2>
I’m currently experimenting with a Rust rewrite of the agent layer using Rig.</p>
The direction is deliberate.</p>
1. Typed Tools</h3>
Every game action becomes a strongly typed tool:</p>

Place entity</li>
Connect belts</li>
Query inventory</li>
Inspect recipes</li>
Read map state</li> </ul>
The domain is highly structured.
Rust’s type system allows encoding that structure directly into the interface.</p>
2. Multi-Turn Agent Loops over RCON</h3>
Instead of single-shot execution, the agent operates in iterative loops:</p>

Observe world state</li>
Plan next action</li>
Execute via RCON</li>
Re-evaluate</li> </ul>
This creates a feedback-driven control system rather than a stateless command generator.</p>
3. RAG over the Recipe Graph</h3>
Factorio’s crafting tree is a dependency graph.</p>
Using retrieval over:</p>

The recipe tree</li>
Wiki documentation</li>
Item production chains</li> </ul>
allows grounding decisions in structured domain knowledge instead of relying purely on model memory.</p>
Why Rust Fits</h2>
Factorio is deterministic and rule-based.</p>
The action space is structured. The state transitions are explicit. The constraints are mechanical.</p>
Rust feels like a natural fit for:</p>

Modeling state transitions</li>
Enforcing invariants</li>
Building typed agent tooling</li>
Keeping orchestration predictable</li> </ul>
When the domain itself is a graph of dependencies, types become leverage.</p>
The Gap</h2>
Humans build megabases.</p>
AI struggles to build a stable smelting line.</p>
That gap is not just amusing — it’s informative.</p>
It exposes the limits of current reasoning systems when faced with:</p>

Long-horizon planning</li>
Structural optimization</li>
Persistent world interaction</li> </ul>
Factorio may quietly become one of the most revealing AI benchmarks available.</p>
The factory must grow — for both humans and AI.</p>

In Rust We Trust 🦀 - factorio

Building an AI Coach That Sees Your Factorio Factory

Factorio as an AI Benchmark

A Rust-Based Approach</h2> I’m currently experimenting with a Rust rewrite of the agent layer using Rig.</p> The direction is deliberate.</p>

A Rust-Based Approach</h2>
I’m currently experimenting with a Rust rewrite of the agent layer using Rig.</p>
The direction is deliberate.</p>