fs_sim - ext2-Inspired File System Simulator
From Theory to Working ext2 Clone – One Buffer at a Time
fs_sim is a clean, educational implementation of a block-based file system inspired by ext2.
Built entirely in C++17, it simulates a disk as a contiguous std::vector<std::byte> and provides a simple REPL/shell where you can format, mount, create files/directories, read/write, list, delete (even recursively), navigate paths, and remount — with data surviving simulated reboots.
The goal was never production use.
It was to move from textbook diagrams to actually making it work — understanding superblocks, block groups, inodes, bitmaps, directory packing, and metadata consistency under real operations.
Motivation
Reading about inodes, bitmaps, and block groups is one thing.
Seeing your ls output survive a simulated crash is another.
I built fs_sim to bridge that gap:
- Turn OS course theory into runnable code
- Practice raw C++ memory management and pointer math
- Debug real corruption bugs (off-by-one inode offsets, directory entry misalignment)
- Create an interactive testbed where I could
mount → mkdir → echo → rm -r → remount → lsand verify everything survived
It’s one of the best ways to internalize why real file systems make the tradeoffs they do.
Core Architecture
- Disk — simulated as a single
std::vector<std::byte>(default 4 KiB blocks) - Superblock — fixed location; tracks total blocks/inodes, blocks per group, inode size, free counts
- Block Groups — disk partitioned into equal groups; each has:
- Block bitmap
- Inode bitmap
- Inode table
- Data blocks
- Inodes — fixed-size; store type (file/dir), permissions (basic), size, timestamps, 12 direct block pointers
- Directories — special files containing variable-length
{name, inode}entries - Persistence — serialize/deserialize the entire buffer so state survives
umount/mountcycles
No indirect/double-indirect blocks yet → files capped at ~48 KiB (12 × 4 KiB).
No full UID/GID or advanced permissions (planned).
Supported Commands (REPL Shell)
Run ./fs_sim to enter the interactive shell:
format— wipe and initialize fresh FSmount— load existing disk statels [path]— list directory contentsmkdir <path>touch <path>echo "content" > <path>— create/write filecat <path>— read file contentrm <path>rm -r <path>— recursive deletepwd,cd <path>— navigation- Absolute (
/home), relative (../docs),./..support
All ops correctly update bitmaps, link counts, free counters, and directory entries.
Technical Challenges & Lessons
The hard parts were exactly what made it valuable:
- Address calculations — inode # → group → table offset → byte offset (off-by-one = instant corruption)
- Directory packing — variable name lengths + alignment → fragile serialization/deserialization
- Path resolution — walking tree from root, handling
./.., preventing cycles/orphans - Memory safety — raw pointers over giant byte buffer; used
std::spanheavily, still chased overruns - Remount consistency — every struct must serialize/deserialize perfectly
Testing: Catch2 unit suite + the REPL itself (format → deep tree → write → umount → mount → verify).
Outcome & Reflection
After months of segfaults and bitmap bugs, the moment I typed ls and saw my directory tree survive a remount felt huge.
This project gave me intuition no textbook could:
- Why block groups exist (localized allocation)
- How bitmaps prevent over-allocation
- The cost of metadata updates
- Why real FSes obsess over crash consistency
If you’re studying OSes or low-level systems, build something like this.
Start flat (no dirs), add hierarchy, then bitmaps/inodes.
The “aha” when your toy FS survives is worth every crash.
Future ideas (some already in progress):
- Single/double indirect pointers
- mmap-backed real file persistence
- Full rwx + UID/GID
- Basic journaling concepts
Links & Next Steps
- Repository & Full README: github.com/pavandhadge/fs_sim
- Build & Run:
mkdir build && cd build cmake .. make ./fs_sim # enter REPL make check # run tests
Clone it, break it, fix it, learn from it.
“Building fs_sim showed me that file systems aren’t magic — they’re careful data structures and relentless consistency checks. And the best way to understand them is to build one yourself.”