Developer Blog
MAR 2026

Pavan Dhadge

Building an ext2-Inspired File System Simulator in C++

From June to July 2025, I built fs_sim, a from-scratch simulation of a block-based, inode-centric file system inspired by ext2.

File systems are one of those things we use every day but rarely think about deeply. You save a file, run ls, move a directory, and everything just works. Underneath that “just works” is a lot of careful engineering around metadata, layout, allocation, consistency, and recovery.

Building a simplified version from scratch made all of that painfully clear, in the best way.


Why I Built It

I did not want to only read about superblocks and inode tables in OS notes. I wanted to implement them and see where things break.

My goals were simple:

  • Understand how ext2-style file systems organize data and metadata
  • Get better at low-level C++ (raw buffers, pointer math, struct layout, serialization)
  • Explore real trade-offs around fragmentation, performance, and correctness
  • Build something interactive I could actually use and test from a shell

That moment when your own mini file system survives a remount and still returns correct file contents is genuinely satisfying.


Core Design (ext2-Style)

The simulator treats a large std::vector<std::byte> as a fake disk. Everything is block-oriented (default 4 KiB blocks).

I followed the classic ext2 mental model:

  • Superblock: global metadata (total/free blocks, total/free inodes, inode size, blocks per group)
  • Block groups: each group contains a block bitmap, inode bitmap, inode table, and data region
  • Inodes: fixed-size records with type, mode, size, timestamps, and 12 direct block pointers
  • Directories: special files storing {name, inode} style entries

I intentionally skipped indirect blocks at first to keep scope under control. With only 12 direct pointers, max file size is about 48 KiB (with 4 KiB blocks), which was enough for this stage.


What You Can Do in the Shell

I built a small REPL to interact with the file system:

  • format
  • mount
  • ls [path]
  • mkdir <path>
  • touch <path>
  • echo "text" > <path>
  • cat <path>
  • rm <path>
  • rm -r <path>
  • pwd, cd, absolute/relative paths, ..

The key part is not just command support. Each operation updates allocation bitmaps, directory entries, inode metadata, link counts, and free-space counters consistently.


Hard Parts (That Took Real Debugging)

A few areas were much harder than they look in theory:

  • Inode address math: inode -> group -> table block -> byte offset had zero room for off-by-one mistakes
  • Directory packing: variable-length names + alignment rules made reads/writes fragile
  • Path traversal: correctly resolving ., .., absolute and relative paths without breaking tree invariants
  • Raw memory safety: pointer-heavy code over one giant byte buffer is unforgiving
  • Persistence: serialization/deserialization bugs showed up only after remount, which made debugging slower

I used std::span in several places to reduce unsafe slicing, but I still hit plenty of corruption bugs before things stabilized.


On Using AI During the Project

I originally planned this as a no-assistance challenge. In reality, I used Gemini as a technical reviewer.

It helped with:

  • Verifying ext2 layout assumptions
  • Catching subtle memory and free-space bugs
  • Cleaning up parts of REPL parsing
  • Drafting a Catch2 test suite

I still made architecture decisions myself, but AI definitely reduced iteration time and helped me spot blind spots faster.


Build and Run

Prerequisites:

  • C++17 compiler (GCC/Clang)
  • CMake 3.10+

Build:

mkdir build && cd build
cmake ..
make

Run shell:

./fs_sim

Run tests:

make check

Current Limits and Next Steps

Current limits:

  • 12 direct pointers only (~48 KiB max file)
  • Minimal permission enforcement
  • No UID/GID model yet
  • Disk is RAM-backed (with serialization support)

Next improvements:

  • Single/double indirect blocks
  • Proper rwx permission checks
  • UID/GID support
  • mmap-backed persistence
  • Basic journaling / write-ahead logging concepts

Final Thoughts

This project taught me more about operating systems than passive reading ever did.

You quickly realize file systems are all about trade-offs: simplicity vs. features, performance vs. safety, and flexibility vs. correctness. Even tiny design choices can create huge downstream complexity.

If you are learning systems programming, I strongly recommend building a mini file system at least once. Start small, add hierarchy, then allocation bitmaps and inodes. The first time your design survives a remount without corruption, it is worth every debugging session.

Code: https://github.com/pavandhadge/fs_sim