Skip to content
Mog is in active development. The GitHub repo, SDK packages, and community channels are not yet available. Follow for launch updates
← Back to blog
·Mog Team

Inside Mog's Formula Engine: Parsing and Evaluating 582 Functions

engineeringformulasrustparser

Excel Formulas Are Harder Than You Think

Most people think of spreadsheet formulas as simple expressions like =A1+B1. The reality is that Excel's formula language is a context-sensitive, dynamically-typed expression language with decades of accumulated complexity.

Consider what a formula engine actually needs to handle:

  • Cell references with four addressing modes: A1 (relative), $A$1 (absolute), A$1 (mixed column-relative), $A1 (mixed row-relative). Each mode has different behavior when a formula is copied or filled.
  • Structured table references like Table1[Revenue] or [@Cost], which resolve against named table regions and use implicit row scoping.
  • Dynamic arrays introduced in modern Excel: FILTER, SORT, UNIQUE, and SEQUENCE return multi-cell results that spill into adjacent cells.
  • The implicit intersection operator (@), which silently reduces a multi-cell reference to a single value based on the formula cell's position.
  • Cross-sheet references (Sheet2!A1:B10), named ranges, array formulas (both legacy CSE and modern dynamic), and error propagation through seven distinct error types (#REF!, #VALUE!, #N/A, #DIV/0!, #NULL!, #NUM!, #NAME?).

This is the language Mog's parser crate must fully understand.

A Hand-Written Recursive Descent Parser

We chose to write the formula parser by hand rather than use a parser generator (LALR, PEG, or otherwise). The reason is straightforward: the Excel formula grammar has context-sensitive ambiguities that generator tools handle poorly.

Take A1:B2. Is that a range reference or a comparison between A1 and B2? It depends on the syntactic context -- inside a function argument it's a range, but after certain operators it could be parsed differently. The colon operator, the space operator (intersection), and the comma (union) all have context-dependent meanings. A hand-written parser gives us full control over these decisions without fighting a grammar specification.

The parser also gives us byte-level position tracking for every AST node. When a formula has an error, we can underline the exact character range in the editor. This is hard to achieve with generated parsers that operate on token streams.

Here is a simplified view of how the parser handles a formula like =SUM(A1:A10) + IF(B1>0, B1, 0):

rust
fn parse_expression(&mut self) -> Result<Expr> {
let left = self.parse_term()?;
while self.peek() == Some(Token::Plus) || self.peek() == Some(Token::Minus) {
let op = self.advance();
let right = self.parse_term()?;
left = Expr::Binary { op, left: Box::new(left), right: Box::new(right) };
}
Ok(left)
}
fn parse_primary(&mut self) -> Result<Expr> {
match self.peek() {
Some(Token::Ident(name)) if self.is_function_name(&name) => {
self.advance();
self.expect(Token::LParen)?;
let args = self.parse_argument_list()?;
self.expect(Token::RParen)?;
Ok(Expr::FunctionCall { name, args, span: self.span() })
}
Some(Token::CellRef(r)) => {
self.advance();
if self.peek() == Some(Token::Colon) {
self.advance();
let end = self.expect_cell_ref()?;
Ok(Expr::Range { start: r, end, span: self.span() })
} else {
Ok(Expr::CellRef(r))
}
}
Some(Token::Number(n)) => { self.advance(); Ok(Expr::Literal(Value::Number(n))) }
_ => Err(self.error("expected expression")),
}
}

The parser produces a typed AST with variants for every expression kind: binary operations, unary operations, function calls, cell references, range references, structured references, array literals, and error literals. Each variant carries a byte span for diagnostics.

The Dependency Graph

Every formula establishes dependencies. =A1+B1 depends on cells A1 and B1. =SUM(A1:A100) depends on 100 cells. The graph crate maintains a directed acyclic graph of these relationships.

The graph is built incrementally. When a user edits a formula, the parser produces an AST, the engine extracts all referenced cells and ranges, removes old edges for that cell, and inserts new ones. Range dependencies are stored as range objects rather than expanded into individual cell edges -- A1:A100 is one edge, not 100.

Cross-sheet references become edges between sheet subgraphs. Named ranges are resolved at graph-build time to their underlying references.

Cycle detection happens at edit time, not evaluation time. When a user types a formula that would create a circular reference, the engine detects it immediately and returns #REF! before any evaluation occurs. This is implemented as a DFS cycle check on the subgraph affected by the edit.

Incremental Recalculation

When a cell value changes, the engine does not re-evaluate the entire workbook. Instead:

1. Mark dirty. The edited cell is marked dirty. 2. Propagate. Walk the dependency graph forward, marking all transitive dependents dirty. 3. Topological sort. Sort the dirty subgraph in dependency order. 4. Evaluate. Walk the sorted list, evaluating each cell. By the time a cell is evaluated, all its dependencies have already been recalculated.

For a typical single-cell edit in a sheet with 10,000 formulas, the dirty subgraph usually contains fewer than 100 cells. The engine recalculates only those cells, leaving the other 9,900+ untouched.

A full recalculation mode is available for cases that need it -- after XLSX import, after structural changes that invalidate large portions of the graph, or when explicitly requested by the user.

Function Dispatch: 582 Functions

Each of Mog's 582 Excel-compatible functions is a Rust function registered in a dispatch table. The #[mog_function] proc-macro handles registration, argument coercion, and documentation generation:

rust
#[mog_function(category = "lookup")]
fn VLOOKUP(
lookup_value: Value,
table: Range,
col_index: i32,
approximate: Option<bool>,
) -> Result<Value> {
let approx = approximate.unwrap_or(true);
let col = (col_index - 1) as usize;
if col >= table.width() {
return Err(Error::Ref);
}
let row = if approx {
table.col(0).binary_search_approximate(&lookup_value)?
} else {
table.col(0).exact_match(&lookup_value).ok_or(Error::NA)?
};
Ok(table.get(row, col))
}

The macro expands to register the function in a global dispatch table, generate argument count validation, handle optional parameters, and coerce argument types (e.g., automatically converting a cell reference to its value). At evaluation time, the engine looks up the function name in the dispatch table and calls the corresponding Rust function with the evaluated arguments.

The 582 functions span these categories: math and trigonometry (80+), statistical (40+), text (40+), financial (50+), date and time (30+), lookup and reference (25+), logical, information, engineering, database, and web functions.

Dynamic Arrays and Spill Ranges

Traditional Excel formulas return a single value to a single cell. Dynamic array functions -- FILTER, SORT, UNIQUE, SEQUENCE, SORTBY, RANDARRAY -- return multi-cell results.

When a dynamic array formula evaluates, the engine:

1. Computes the result array. The function returns a 2D array of values. 2. Checks for spill conflicts. If any cell in the target spill range already contains data, the formula returns a #SPILL! error instead. 3. Reserves the spill range. The engine writes values into the spill range and records the spill extent metadata on the formula cell. 4. Updates the dependency graph. Every cell in the spill range becomes a dependent of the formula cell. If another formula references a spill target (e.g., =D3 where D3 is part of a spill range), the graph records a dependency on the originating formula cell.

When the source data changes and the spill range shrinks or grows, the engine clears the old range, writes the new range, and updates the graph edges accordingly. This bookkeeping is handled by the graph crate in coordination with the document model.

What This Adds Up To

The formula engine is roughly 40,000 lines of Rust across the parser, functions, and graph crates. It compiles to the same WebAssembly binary that runs in the browser, the same native module that runs in Node.js, and the same Python extension that runs in Jupyter notebooks. One engine, every platform, full Excel compatibility.

For more on Mog's overall architecture -- the binary wire protocol, CRDT collaboration, canvas rendering -- see our Architecture Deep Dive post.