Building an ext2-Inspired File System Simulator in C++
From June to July 2025, I built fs_sim, a from-scratch simulation of a block-based, inode-centric file system inspired by ext2.
File systems are one of those things we use every day but rarely think about deeply. You save a file, run ls, move a directory, and everything just works. Underneath that “just works” is a lot of careful engineering around metadata, layout, allocation, consistency, and recovery.
Building a simplified version from scratch made all of that painfully clear, in the best way.
Why I Built It
I did not want to only read about superblocks and inode tables in OS notes. I wanted to implement them and see where things break.
My goals were simple:
- Understand how ext2-style file systems organize data and metadata
- Get better at low-level C++ (raw buffers, pointer math, struct layout, serialization)
- Explore real trade-offs around fragmentation, performance, and correctness
- Build something interactive I could actually use and test from a shell
That moment when your own mini file system survives a remount and still returns correct file contents is genuinely satisfying.
Core Design (ext2-Style)
The simulator treats a large std::vector<std::byte> as a fake disk. Everything is block-oriented (default 4 KiB blocks).
I followed the classic ext2 mental model:
Superblock: global metadata (total/free blocks, total/free inodes, inode size, blocks per group)Block groups: each group contains a block bitmap, inode bitmap, inode table, and data regionInodes: fixed-size records with type, mode, size, timestamps, and 12 direct block pointersDirectories: special files storing{name, inode}style entries
I intentionally skipped indirect blocks at first to keep scope under control. With only 12 direct pointers, max file size is about 48 KiB (with 4 KiB blocks), which was enough for this stage.
What You Can Do in the Shell
I built a small REPL to interact with the file system:
formatmountls [path]mkdir <path>touch <path>echo "text" > <path>cat <path>rm <path>rm -r <path>pwd,cd, absolute/relative paths,..
The key part is not just command support. Each operation updates allocation bitmaps, directory entries, inode metadata, link counts, and free-space counters consistently.
Hard Parts (That Took Real Debugging)
A few areas were much harder than they look in theory:
- Inode address math:
inode -> group -> table block -> byte offsethad zero room for off-by-one mistakes - Directory packing: variable-length names + alignment rules made reads/writes fragile
- Path traversal: correctly resolving
.,.., absolute and relative paths without breaking tree invariants - Raw memory safety: pointer-heavy code over one giant byte buffer is unforgiving
- Persistence: serialization/deserialization bugs showed up only after remount, which made debugging slower
I used std::span in several places to reduce unsafe slicing, but I still hit plenty of corruption bugs before things stabilized.
On Using AI During the Project
I originally planned this as a no-assistance challenge. In reality, I used Gemini as a technical reviewer.
It helped with:
- Verifying ext2 layout assumptions
- Catching subtle memory and free-space bugs
- Cleaning up parts of REPL parsing
- Drafting a Catch2 test suite
I still made architecture decisions myself, but AI definitely reduced iteration time and helped me spot blind spots faster.
Build and Run
Prerequisites:
- C++17 compiler (GCC/Clang)
- CMake 3.10+
Build:
mkdir build && cd build
cmake ..
make
Run shell:
./fs_sim
Run tests:
make check
Current Limits and Next Steps
Current limits:
- 12 direct pointers only (~48 KiB max file)
- Minimal permission enforcement
- No UID/GID model yet
- Disk is RAM-backed (with serialization support)
Next improvements:
- Single/double indirect blocks
- Proper rwx permission checks
- UID/GID support
- mmap-backed persistence
- Basic journaling / write-ahead logging concepts
Final Thoughts
This project taught me more about operating systems than passive reading ever did.
You quickly realize file systems are all about trade-offs: simplicity vs. features, performance vs. safety, and flexibility vs. correctness. Even tiny design choices can create huge downstream complexity.
If you are learning systems programming, I strongly recommend building a mini file system at least once. Start small, add hierarchy, then allocation bitmaps and inodes. The first time your design survives a remount without corruption, it is worth every debugging session.