Rust Memory Layout Under the Hood

25th Feb 2026
3 min read
Tags:
rust,
systems-programming,
performance,
memory

Rust Memory Management Cheat Sheet

Every Rust struct you write is a memory layout decision.

The kind of thing I wish I had a clear reference for when I was staring at perf output wondering why my “simple” struct was blowing cache lines. So I made one.

Most engineers never think about it. But the compiler does.

Struct Field Reordering

Take a struct with fields: u8, u64, i8, i64, i32.

In C with #[repr(C)], that’s 36 bytes. Padding scattered between fields because the compiler must preserve your field order.

In Rust? 24 bytes. Same fields. The compiler reordered them — largest alignment first, packing b:u64 and d:i64 together, then e:i32, then the small types. 33% smaller. And it’s not just about size — putting those two 64-bit fields adjacent enables SIMD vectorization. movdqu + paddq processing 2×64bit in parallel. That reordering is doing more work than most “optimizations” people spend days on.

This is the kind of thing that compounds.

Vec<T> Layout

Vec<T> is 24 bytes on the stack — pointer, capacity, length — pointing to a contiguous heap buffer. 8 elements of i64 = 64 bytes = exactly one cache line. The compiler can generate vectorized code paths when your data aligns to cache lines. Your data layout is literally shaping the assembly.

String vs &str

&str is a 16-byte fat pointer — just borrowing UTF-8 data. String is 24 bytes — same as Vec because it wraps Vec<u8>. Know when you need ownership vs borrowing and you stop allocating for no reason.

The Two-Tier Allocation Problem

Vec<String> is where it gets expensive. Two-tier heap allocation — 1 allocation for the Vec’s buffer of String structs, then N separate allocations for each String’s contents. N+1 allocations total. N+1 deallocations on drop. This is why Vec<&str> exists and why arena allocators matter.

Dynamic Dispatch

dyn Trait — every trait object is a 16-byte fat pointer: data pointer + vtable pointer. That vtable holds drop, size, align, then your methods at known offsets. Every method call is jmp [vtable + offset]. No inlining. No devirtualization. You’re trading static dispatch performance for runtime flexibility, and that’s fine — as long as it’s a conscious choice.

None of this is hidden knowledge. It’s all in the compiler output if you look. But I got tired of reconstructing it every time, so I made the visual cheat sheet above that maps out the actual byte-level layouts — alignment rules, field reordering, Vec/String internals, smart pointers, vtable dispatch, and the two-tier allocation pattern.