Building an ext2-Inspired File System Simulator in C++

From June to July 2025, I built fs_sim, a from-scratch simulation of a block-based, inode-centric file system inspired by ext2.

File systems are one of those things we use every day but rarely think about deeply. You save a file, run ls, move a directory, and everything just works. Underneath that “just works” is a lot of careful engineering around metadata, layout, allocation, consistency, and recovery.

Building a simplified version from scratch made all of that painfully clear, in the best way.

Why I Built It

I did not want to only read about superblocks and inode tables in OS notes. I wanted to implement them and see where things break.

My goals were simple:

Understand how ext2-style file systems organize data and metadata
Get better at low-level C++ (raw buffers, pointer math, struct layout, serialization)
Explore real trade-offs around fragmentation, performance, and correctness
Build something interactive I could actually use and test from a shell

That moment when your own mini file system survives a remount and still returns correct file contents is genuinely satisfying.

Core Design (ext2-Style)

The simulator treats a large std::vector<std::byte> as a fake disk. Everything is block-oriented (default 4 KiB blocks).

I followed the classic ext2 mental model:

Superblock: global metadata (total/free blocks, total/free inodes, inode size, blocks per group)
Block groups: each group contains a block bitmap, inode bitmap, inode table, and data region
Inodes: fixed-size records with type, mode, size, timestamps, and 12 direct block pointers
Directories: special files storing {name, inode} style entries

I intentionally skipped indirect blocks at first to keep scope under control. With only 12 direct pointers, max file size is about 48 KiB (with 4 KiB blocks), which was enough for this stage.

What You Can Do in the Shell

I built a small REPL to interact with the file system:

format
mount
ls [path]
mkdir <path>
touch <path>
echo "text" > <path>
cat <path>
rm <path>
rm -r <path>
pwd, cd, absolute/relative paths, ..

The key part is not just command support. Each operation updates allocation bitmaps, directory entries, inode metadata, link counts, and free-space counters consistently.

Hard Parts (That Took Real Debugging)

A few areas were much harder than they look in theory:

Inode address math: inode -> group -> table block -> byte offset had zero room for off-by-one mistakes
Directory packing: variable-length names + alignment rules made reads/writes fragile
Path traversal: correctly resolving ., .., absolute and relative paths without breaking tree invariants
Raw memory safety: pointer-heavy code over one giant byte buffer is unforgiving
Persistence: serialization/deserialization bugs showed up only after remount, which made debugging slower

I used std::span in several places to reduce unsafe slicing, but I still hit plenty of corruption bugs before things stabilized.

On Using AI During the Project

I originally planned this as a no-assistance challenge. In reality, I used Gemini as a technical reviewer.

It helped with:

Verifying ext2 layout assumptions
Catching subtle memory and free-space bugs
Cleaning up parts of REPL parsing
Drafting a Catch2 test suite

I still made architecture decisions myself, but AI definitely reduced iteration time and helped me spot blind spots faster.

Build and Run

Prerequisites:

C++17 compiler (GCC/Clang)
CMake 3.10+

Build:

mkdir build && cd build
cmake ..
make

Run shell:

./fs_sim

Run tests:

make check

Current Limits and Next Steps

Current limits:

12 direct pointers only (~48 KiB max file)
Minimal permission enforcement
No UID/GID model yet
Disk is RAM-backed (with serialization support)

Next improvements:

Single/double indirect blocks
Proper rwx permission checks
UID/GID support
mmap-backed persistence
Basic journaling / write-ahead logging concepts

Final Thoughts

This project taught me more about operating systems than passive reading ever did.

You quickly realize file systems are all about trade-offs: simplicity vs. features, performance vs. safety, and flexibility vs. correctness. Even tiny design choices can create huge downstream complexity.

If you are learning systems programming, I strongly recommend building a mini file system at least once. Start small, add hierarchy, then allocation bitmaps and inodes. The first time your design survives a remount without corruption, it is worth every debugging session.

Code: https://github.com/pavandhadge/fs_sim

Pavan Dhadge

pavandhadge01@gmail.com LinkedIn GitHub

pavandhadge01@gmail.com

it is what it is , life it shit happens