Skip to content
Mog is in active development. The GitHub repo, SDK packages, and community channels are not yet available. Follow for launch updates
← Back to blog
·Mog Team

Why We Chose CRDTs Over OT for Spreadsheet Collaboration

engineeringcollaborationcrdt

The Problem

Real-time collaboration in spreadsheets is fundamentally harder than in text editors.

In a text document, the data structure is a sequence of characters. Two users typing in different paragraphs rarely conflict, and when they do, the resolution is straightforward: interleave the insertions at the correct offsets.

Spreadsheets are different. The data model is a two-dimensional grid with structured references between cells. A formula in B2 that says =A1+A3 doesn't just contain a value — it encodes positional dependencies. Now consider what happens when User A inserts a row above row 3 while User B edits cell A3. User A's operation shifts the meaning of "row 3." User B's operation targets a cell that has moved. The formula in B2 might need to become =A1+A4, or it might not, depending on the intent of both edits.

This is the structural change problem. Insert a row, delete a column, move a range — each of these operations reshapes the grid in ways that affect every formula reference, every named range, and every conditional format rule that touches the affected region.

Any collaboration system for spreadsheets must solve this problem correctly, or users will lose data.

OT vs. CRDTs

Operational Transformation (OT) is the classic approach, used by Google Docs and Google Sheets. The core idea: when two users produce concurrent operations, a central server applies transformation functions to adjust one operation against the other before applying both. For text, you need a handful of transform functions (insert-vs-insert, insert-vs-delete, etc.). For spreadsheets, the transformation matrix explodes. You need transforms for every combination of: cell edit, row insert, row delete, column insert, column delete, range move, merge, unmerge, formula change, format change, and more. Each pair needs a correct bidirectional transform function. Getting one wrong means silent data corruption.

OT also requires a central server. The server serializes all operations into a total order and applies transformations. Without the server, two clients cannot resolve conflicts on their own.

CRDTs (Conflict-Free Replicated Data Types) take a different approach. Instead of transforming operations after the fact, CRDTs structure the data so that concurrent operations commute by construction. Any two replicas that have seen the same set of operations converge to the same state, regardless of the order they were applied. No central authority needed.

For Mog, CRDTs were the right choice for three reasons:

1. Offline-first is a core requirement. Mog's compute engine runs client-side as a WASM binary. Users can open, edit, and save spreadsheets without ever contacting a server. Collaboration must work the same way — accumulate edits offline, merge when reconnected.

2. Peer-to-peer sync without a relay. Two Mog clients on the same network can sync directly. OT cannot do this without a server to serialize the operation log.

3. The server is optional. Mog offers a collaboration server for relay and persistence, but it is not required for correctness. The CRDT guarantees convergence regardless of network topology.

The Cell Identity Model

Choosing CRDTs doesn't automatically solve the structural change problem. If cells are keyed by grid position (row, col), a row insertion still scrambles every reference. The CRDT would faithfully replicate the scrambled state to every peer.

The key design decision in Mog is the Cell Identity Model: every cell gets a stable UUID, and that UUID — not the grid position — is the CRDT key.

rust
struct Cell {
id: Uuid, // Stable identity, never changes
value: CellValue, // The cell's content
formula: Option<Formula>,
format: FormatId,
}
struct Sheet {
// Position mapping: grid coordinates -> cell identity
row_order: Vec<Uuid>, // ordered row IDs
col_order: Vec<Uuid>, // ordered column IDs
cells: YMap<Uuid, Cell>, // Yrs CRDT map, keyed by UUID
}

When a user inserts a row, the operation modifies row_order — it splices a new row UUID into the sequence. It does not touch any existing cell. When a user edits a cell, the operation modifies that cell's entry in cells by its UUID. These two operations target different CRDT keys, so they commute naturally.

Formulas store references as UUIDs internally, not as A1-style addresses:

rust
// What the user sees: =A1+A3
// What the CRDT stores:
Formula {
display: "=A1+A3",
refs: vec![
CellRef { id: uuid!("a1b2c3..."), display: "A1" },
CellRef { id: uuid!("d4e5f6..."), display: "A3" },
],
}

When User A inserts a row above row 3, the cell formerly at A3 is now visually at A4, but its UUID hasn't changed. The formula still references the correct cell. The display string "A3" gets updated to "A4" on the next recalculation pass — a local, deterministic operation that doesn't require coordination.

How Yrs Fits In

Mog uses Yrs (the Rust port of Yjs) as the CRDT engine. Yrs provides CRDT primitives — YMap, YArray, YText — that handle concurrent updates, garbage collection of tombstones, and efficient binary encoding for sync.

Each cell's value, formula, and format are stored as entries in a YMap keyed by the cell's UUID. Structural operations (insert row, delete column, move range) are decomposed into CRDT-compatible primitives:

  • Insert row: Insert a new UUID into the YArray for row_order at the target index. No cell data is touched.
  • Delete column: Remove the column UUID from col_order. Mark affected cells as deleted in the YMap. Tombstone cleanup is handled by Yrs.
  • Move range: Reorder entries in row_order and col_order. Cell identities stay the same.

All operations go through Yrs transactions, which ensures atomic application and correct CRDT merge semantics:

rust
doc.transact_mut(|txn| {
// Insert a new row at index 5
let new_row_id = Uuid::new_v4();
row_order.insert(txn, 5, new_row_id.to_string());
// Initialize empty cells for the new row
for col_id in col_order.iter(txn) {
let cell_id = Uuid::new_v4();
let cell_key = format!("{}/{}", new_row_id, col_id);
cells.insert(txn, cell_key, YMap::new());
}
});

Sync between peers is handled by exchanging Yrs state vectors and update diffs — a well-understood protocol that works over WebSocket, WebRTC, or any other transport.

Trade-Offs We Accepted

This design is not free. We accepted several costs:

Memory overhead. A UUID is 16 bytes. A (row, col) pair is 8 bytes. For a sheet with 1 million cells, that's an extra 8MB just for identity storage, before accounting for the CRDT metadata (vector clocks, tombstones, operation log). In practice, Mog's memory footprint for collaboration metadata is roughly 2-3x what a flat grid model would require.

Complexity in reference resolution. Every formula evaluation must resolve UUIDs to current grid positions for display. This adds an indirection layer — essentially a hash map lookup per reference. We mitigate this with a cached position index that is rebuilt incrementally when the grid structure changes, keeping the overhead to single-digit microseconds per lookup.

Document size. A Yrs document is larger than a flat JSON or binary representation of the same data. CRDT metadata — vector clocks, client IDs, operation history for undo — adds to the wire and storage size. For a typical spreadsheet, the CRDT document is 1.5-2x the size of a plain serialization.

Worth it. These costs are measurable but manageable. What we get in return is correct merges without a central server, offline editing that just works, and structural operations that compose without a combinatorial explosion of transformation functions.

Practical Result

Consider three users editing concurrently:

  • User A inserts a row between rows 2 and 3.
  • User B edits cell C3, changing its value to 42.
  • User C moves the range A1:D2 down by one row.

With OT, you'd need pairwise transforms for insert-row vs. edit-cell, insert-row vs. move-range, and edit-cell vs. move-range — six transform functions, each of which must handle edge cases around overlapping regions.

With Mog's Cell Identity Model:

1. User A's insert splices a new row UUID into row_order. No cell data is modified. 2. User B's edit writes 42 to the YMap entry for cell C3's UUID. The UUID hasn't changed, regardless of inserted rows. 3. User C's move reorders entries in row_order. The cells keep their UUIDs.

All three operations target different CRDT keys. They merge in any order and produce the same result: the new row appears in the correct position, the edited cell contains 42 at its new visual location, and the moved range is where User C put it.

No data loss. No manual conflict resolution. No central server required.

That is the payoff of choosing CRDTs with stable cell identities. The upfront design cost pays for itself every time two users edit the same spreadsheet without thinking about it.