Featured Post

Building pyfs-watcher: A Rust-Powered Filesystem Toolkit for Python

How I built a high-performance filesystem toolkit for Python using Rust and PyO3 — parallel directory walking, BLAKE3 hashing, file deduplication, and real-time watching, all from pip install.

5 min read
By Pratyush Sharma

Building pyfs-watcher: A Rust-Powered Filesystem Toolkit for Python

Python is great for scripting filesystem tasks — until it isn't. os.walk is single-threaded. shutil.copy blocks without progress. hashlib can't hash files in parallel. If you've ever run a deduplication script on a 500GB photo library and waited 45 minutes, you know the pain.

I built pyfs-watcher to fix this. It's a Python package with a Rust core that provides fast, parallel filesystem operations — directory walking, file hashing, bulk copy/move, real-time watching, and deduplication. You install it with pip install pyfs-watcher and use it like any Python library. The Rust stays invisible.

Why Rust, not C or Cython?

The decision came down to three things:

  1. Memory safety without a GC — Filesystem code touches raw buffers, file handles, and threads. Rust catches data races and use-after-free at compile time. In C, these bugs show up as segfaults in production.

  2. PyO3 is mature — PyO3 provides first-class Python bindings for Rust. You annotate Rust structs with #[pyclass] and functions with #[pyfunction], and it generates the CPython interface. No manual PyObject juggling.

  3. The ecosystem — Rust has battle-tested crates for exactly what I needed: jwalk for parallel directory traversal, blake3 for fast hashing, notify for cross-platform file watching, and rayon for data parallelism.

Core features

Parallel directory walking

The standard os.walk visits directories one at a time. pyfs-watcher uses jwalk under the hood, which spawns a thread pool and walks multiple branches of the directory tree concurrently.

from pyfs_watcher import walk, walk_collect

# Streaming iterator — memory efficient for huge directories
for entry in walk("/data/photos", glob_pattern="*.jpg", max_depth=5):
    print(entry.path, entry.size, entry.is_dir)

# Or collect everything at once
entries = walk_collect("/data/photos", sort_by_name=True)
print(f"Found {len(entries)} files")

The API supports glob filtering, file type filtering (files_only, dirs_only), max depth, hidden file handling, and sorting — all pushed down to the Rust layer so Python never sees entries it doesn't need.

BLAKE3 and SHA-256 hashing

File hashing is embarrassingly parallel. pyfs-watcher uses memory-mapped I/O for files larger than 128MB and processes multiple files across threads:

from pyfs_watcher import hash_file, hash_files

# Single file
digest = hash_file("backup.tar.gz", algorithm="blake3")

# Batch hashing with progress callback
def on_progress(path, hash_value):
    print(f"Hashed: {path}")

results = hash_files(
    ["/data/file1.iso", "/data/file2.iso"],
    algorithm="sha256",
    callback=on_progress
)

BLAKE3 is significantly faster than SHA-256 for large files because it's designed for parallelism internally — it splits files into 1KB chunks and hashes them in a tree structure.

The 3-stage deduplication pipeline

This is the feature I'm most proud of. Naive deduplication hashes every file completely, which is wasteful when most files are unique. pyfs-watcher uses a staged pipeline:

Stage 1: Size grouping — Files with unique sizes can't be duplicates. This eliminates the majority of files instantly with just a stat() call — no reads at all.

Stage 2: Partial hash — For files with matching sizes, hash just the first 4KB. Different headers mean different files. This catches most remaining non-duplicates with minimal I/O.

Stage 3: Full hash — Only files that match on both size and partial hash get fully hashed. By this stage, you're usually looking at actual duplicates.

from pyfs_watcher import find_duplicates

groups = find_duplicates(
    "/data/photos",
    min_size=1024,  # skip files under 1KB
    algorithm="blake3"
)

for group in groups:
    print(f"Hash: {group.hash}")
    print(f"Wasted: {group.wasted_bytes / 1024 / 1024:.1f} MB")
    for path in group.paths:
        print(f"  {path}")

On a test directory with 50,000 files (120GB), the staged approach finished in 8 seconds compared to 47 seconds for naive full-hash deduplication. The savings get more dramatic as the duplicate ratio drops.

Real-time file watching

Cross-platform file watching that works on Linux (inotify), macOS (FSEvents), and Windows (ReadDirectoryChanges):

from pyfs_watcher import FileWatcher

# Sync interface with context manager
with FileWatcher("/project/src", debounce_ms=500) as watcher:
    for event in watcher:
        print(f"{event.kind}: {event.path}")
        if event.kind == "modified" and event.path.endswith(".py"):
            run_tests()

# Async interface for integration with asyncio
import asyncio
from pyfs_watcher import async_watch

async def watch():
    async for event in async_watch("/project/src"):
        print(f"{event.kind}: {event.path}")

asyncio.run(watch())

The debounce parameter prevents event floods when editors write to temp files before renaming. Ignore patterns let you skip __pycache__, .git, and node_modules.

Build and distribution

Building a Rust-Python hybrid package for multiple platforms is the hardest part of the project. I use maturin as the build backend, which compiles Rust code into a Python wheel with the correct platform tags.

The CI/CD pipeline uses GitHub Actions with a matrix strategy:

# Build wheels for Linux, macOS, and Windows
strategy:
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]
    python-version: ['3.9', '3.10', '3.11', '3.12']

On every push, the CI runs Ruff (linting), MyPy (type checking), Cargo clippy (Rust linting), and the full test suite. On release, maturin builds platform-specific wheels and publishes to PyPI with trusted publishing — no API tokens stored in secrets.

The Rust build uses link-time optimization (LTO) and single codegen unit for release builds, which produces smaller, faster binaries at the cost of longer compile times.

Error handling across the language boundary

One design challenge is bridging Rust's Result<T, E> pattern to Python exceptions cleanly. I created a typed exception hierarchy:

  • FsWatcherError — base exception
  • WalkError — directory traversal failures (permissions, broken symlinks)
  • HashError — hashing failures (file not found, I/O errors)
  • CopyError — copy/move failures (disk full, cross-device)
  • WatchError — watching failures (too many open files, path not found)

Each Rust error type maps to a specific Python exception via PyO3's create_exception! macro. This means you can write idiomatic Python error handling:

from pyfs_watcher import walk, WalkError

try:
    entries = walk_collect("/root/secret")
except WalkError as e:
    print(f"Walk failed: {e}")

What I learned

Memory-mapped I/O matters. For hashing large files, mmap avoids copying file data from kernel space to user space. The 128MB threshold was chosen empirically — below that, sequential reads are faster due to mmap overhead.

PyO3's GIL management is crucial. Long-running Rust operations must release the GIL with py.allow_threads() so other Python threads can run. Forgetting this makes your "fast" Rust code block the entire Python process.

Cross-platform filesystem behavior is wild. Windows UNC paths (\\?\), macOS case-insensitive filesystems, Linux's inotify watch limits — each platform has quirks that only surface in CI. The dunce crate helps normalize Windows paths, but you still need platform-specific test cases.

Try it

pip install pyfs-watcher

The source is on GitHub. Contributions welcome — especially for benchmarks against other tools and new platform-specific optimizations.

Published on February 23, 2026

Related Posts

Enjoyed this post?

Subscribe to get notified when I publish new content about web development and technology.