2026-05-02

RAII in a Database Engine: Clean Abstraction or Hidden Footgun?

Why RAII page guards can improve correctness in a buffer pool, and where they still bite.

In a storage engine,
Who is responsible for marking pages as dirty, and when?

A basic simple approach in buffer pool design would look like this:

Return raw Page*
Let callers modify memory directly
Mark pages dirty when calling UnpinPage

This might seems reasonable:
Modify → Unpin → mark dirty

But in this case:
Correctness depends on the caller remembering to do the right thing.
That's not safe enough.

If a developer forgets to mark a page as dirty after modifying it:

The buffer pool may evict the page without flushing
Changes are silently lost
No obvious crash - just incorrect state

A more robust design should guarantee:

Pages are marked dirty immediately after modification
Pinned pages are always eventually released (unpin is never forgotten)

And ideally:

These guarantees should be enforced by the API - not by convention.

Two Approaches

There are two primary ways to manage page lifetime and mutation in a buffer pool.

1. Manual Lifecycle Management

Example:

Page* p = bpm.FetchPage(id);
Modify(p->GetData());
// (true) flags dirty page
bpm.UnpinPage(id, true); // caller must remember this

Pros

Simple and explicit
Flexible control over lifecycle

Cons

Easy to forget UnpinPage
Easy to forget marking dirty
No enforcement of correctness
Bugs are silent and hard to trace

2. RAII-Based Guards

Instead of returning raw pointers, the buffer pool returns guard objects that manage page lifetime.

Example:

{
    auto page = bpm.FetchPageWrite(id);
    Modify(page.GetData());
} // automatically: mark dirty + unpin

Two types of guards:

ReadPageGuard
WritePageGuard

Write Guard Behavior

Provides mutable access to page data
On destruction:
- Marks the page as dirty
- Unpins the page

Pros

Enforces cleanup automatically
Eliminates forgotten UnpinPage
Ensures dirty marking is not skipped
Encodes correctness into the API

Cons

Lifetime is tied to scope
Easy to accidentally hold resources too long
Requires more discipline in structuring code

Critical Rule: Guards Must Be Move-Only

This is non-negotiable.

If guards are copy-able:

auto g2 = g1;

You now have:

Two objects managing the same page
Two destructors calling Unpin

This leads to:

Double unpin
Broken pin counts
Undefined behavior

So guards must be:

Non-copyable
Move-only

Conceptually similar to std::unique_ptr.

The Real Tradeoff

RAII doesn't eliminate problems - it moves them.

With RAII, a misuse looks like this:

auto page = bpm.FetchPageWrite(id);
// ... long or complex logic ...
// page remains pinned the entire time

Effect:

Page cannot be evicted
Buffer pool capacity shrinks
Under pressure → pool can become fully pinned

Design Guidelines

If you adopt RAII guards in a buffer pool:

Keep guard lifetimes as short as possible
Avoid passing guards across layers unnecessarily
Treat guards like locks, not just access handles

Conclusion

RAII-based guards are generally the stronger design for storage engines:

They enforce correctness by default
They remove reliance on human discipline for critical invariants

But they introduce a different responsibility:

Managing scope becomes just as important as managing correctness.

RAII doesn't make the system foolproof -
it just makes the failure modes more explicit.

API Reference

Full API (including FetchPageRead, FetchPageWrite, and guard behavior):
https://github.com/beshirr/EEP-DB/blob/main/docs/storage_api.md