2026-05-02

RAII in a Database Engine: Clean Abstraction or Hidden Footgun?

Why RAII page guards can improve correctness in a buffer pool, and where they still bite.

In a storage engine,
Who is responsible for marking pages as dirty, and when?

A basic simple approach in buffer pool design would look like this:

  • Return raw Page*
  • Let callers modify memory directly
  • Mark pages dirty when calling UnpinPage

This might seems reasonable:
Modify → Unpin → mark dirty

But in this case:
Correctness depends on the caller remembering to do the right thing.
That's not safe enough.

If a developer forgets to mark a page as dirty after modifying it:

  • The buffer pool may evict the page without flushing
  • Changes are silently lost
  • No obvious crash - just incorrect state

A more robust design should guarantee:

  • Pages are marked dirty immediately after modification
  • Pinned pages are always eventually released (unpin is never forgotten)

And ideally:

These guarantees should be enforced by the API - not by convention.


Two Approaches

There are two primary ways to manage page lifetime and mutation in a buffer pool.

1. Manual Lifecycle Management

Example:

Page* p = bpm.FetchPage(id);
Modify(p->GetData());
// (true) flags dirty page
bpm.UnpinPage(id, true); // caller must remember this

Pros

  • Simple and explicit
  • Flexible control over lifecycle

Cons

  • Easy to forget UnpinPage
  • Easy to forget marking dirty
  • No enforcement of correctness
  • Bugs are silent and hard to trace

2. RAII-Based Guards

Instead of returning raw pointers, the buffer pool returns guard objects that manage page lifetime.

Example:

{
    auto page = bpm.FetchPageWrite(id);
    Modify(page.GetData());
} // automatically: mark dirty + unpin

Two types of guards:

  • ReadPageGuard
  • WritePageGuard

Write Guard Behavior

  • Provides mutable access to page data
  • On destruction:
    • Marks the page as dirty
    • Unpins the page

Pros

  • Enforces cleanup automatically
  • Eliminates forgotten UnpinPage
  • Ensures dirty marking is not skipped
  • Encodes correctness into the API

Cons

  • Lifetime is tied to scope
  • Easy to accidentally hold resources too long
  • Requires more discipline in structuring code

Critical Rule: Guards Must Be Move-Only

This is non-negotiable.

If guards are copy-able:

auto g2 = g1;

You now have:

  • Two objects managing the same page
  • Two destructors calling Unpin

This leads to:

  • Double unpin
  • Broken pin counts
  • Undefined behavior

So guards must be:

  • Non-copyable
  • Move-only

Conceptually similar to std::unique_ptr.


The Real Tradeoff

RAII doesn't eliminate problems - it moves them.

With RAII, a misuse looks like this:

auto page = bpm.FetchPageWrite(id);
// ... long or complex logic ...
// page remains pinned the entire time

Effect:

  • Page cannot be evicted
  • Buffer pool capacity shrinks
  • Under pressure → pool can become fully pinned

Design Guidelines

If you adopt RAII guards in a buffer pool:

  • Keep guard lifetimes as short as possible
  • Avoid passing guards across layers unnecessarily
  • Treat guards like locks, not just access handles

Conclusion

RAII-based guards are generally the stronger design for storage engines:

  • They enforce correctness by default
  • They remove reliance on human discipline for critical invariants

But they introduce a different responsibility:

Managing scope becomes just as important as managing correctness.

RAII doesn't make the system foolproof - 
it just makes the failure modes more explicit.


API Reference

Full API (including FetchPageRead, FetchPageWrite, and guard behavior):
https://github.com/beshirr/EEP-DB/blob/main/docs/storage_api.md