Reference Manual

The lezer system consists of three modules, each distributed as a separate package on npm.

lezer module


interface ParseOptions

Options that can be passed to control parsing.

cache⁠?: Tree

Passing a cached tree is used for incremental parsing. This should be a tree whose content is aligned with the current document (though a call to Tree.unchanged) if any changes were made since it was produced. The parser will try to reuse nodes from this tree in the new parse, greatly speeding up the parse when it can reuse nodes for most of the document.

strict⁠?: boolean

When true, the parser will raise an exception, rather than run its error-recovery strategies, when the input doesn't match the grammar.

bufferLength⁠?: number

The maximum length of the TreeBuffers generated in the output tree. Defaults to 1024.

top⁠?: string

The name of the @top declaration to parse from. If not specified, the first @top declaration is used.

dialect⁠?: string

A space-separated string of dialects to enable.

class Parser

A parser holds the parse tables for a given grammar, as generated by lezer-generator.

readonly group: NodeGroup

A node group with the node types used by this parser.

readonly topRules: Object<[number, number]>

Maps top rule names to [state ID, top term ID] pairs.

parse(inputInputStream | string, options⁠?: ParseOptions) → Tree

Parse a given string or stream.

startParse(inputInputStream | string, options⁠?: ParseOptions) → ParseContext

Create a ParseContext.

withNested(specObject<NestedGrammar>) → Parser

Create a new Parser instance with different values for (some of) the nested grammars. This can be used to, for example, swap in a different language for a nested grammar or fill in a nested grammar that was left blank by the original grammar.

withProps(...propsNodePropSource[]) → Parser

Create a new Parser instance whose node types have the given props added. You should use NodeProp.add to create the arguments to this method.

withTokenizer(fromExternalTokenizer, toExternalTokenizer) → Parser

Replace the given external tokenizer with another one, returning a new parser object.

getName(termnumber) → string

Returns the name associated with a given term. This will only work for all terms when the parser was generated with the --names option. By default, only the names of tagged terms are stored.

readonly hasNested: boolean

Tells you whether this grammar has any nested grammars.

readonly topType: NodeType

The node type produced by the default top rule.

class ParseContext

A parse context can be used for step-by-step parsing. After creating it, you repeatedly call .advance() until it returns a tree to indicate it has reached the end of the parse.

pos: number
advance() → null | Tree

Move the parser forward. This will process all parse stacks at this.pos and try to advance them to a further position. If no stack for such a position is found, it'll start error-recovery.

When the parse is finished, this will return a syntax tree. When not, it returns null.

forceFinish() → Tree

Force the parse to finish, generating a tree containing the nodes parsed so far.

readonly badness: number

A value that indicates how successful the parse is so far, as the number of error-recovery steps taken divided by the number of tokens parsed. Could be used to decide to abort a parse when the input doesn't appear to match the grammar at all.

type NestedGrammar = null | Parser | fn(inputInputStream, stackStack) → NestedGrammarSpec

Nested grammar values are associated with nesting positions in the grammar. If they are null, the nested region is simply skipped over. If they hold a parser object, that parser is used to parse the region. To implement dynamic behavior, the value may also be a function which returns a description of the way the region should be parsed.

interface NestedGrammarSpec

An object indicating how to proceed with a nested parse.

parser⁠?: Parser

When given, this is used to provide a parser that should be used to parse the content.

top⁠?: string

When parser is given, this can be used to configure which top rule to parse with it.

dialect⁠?: string

When parser is given, this can be used to configure a dialect.

stay⁠?: boolean

This being true means that the outer grammar should use the fallback expression provided for the nesting to parse the content.

parseNode⁠?: fn(inputInputStream, startnumber) → Tree

Alternatively, parseNode may hold a function which will be made responsible for parsing the region.

wrapType⁠?: number

An optional extra type to tag the resulting tree with.

filterEnd⁠?: fn(endTokenstring) → boolean

When a filterEnd property is present, that should hold a function that determines whether a given end token (which matches the end token specified in the grammar) should be used (true) or ignored (false). This is mostly useful for implementing things like XML closing tag matching.

class Stack

A parse stack. These are used internally by the parser to track parsing progress. They also provide some properties and methods that external code such as a tokenizer can use to get information about the parse state.

pos: number

The input position up to which this stack has parsed.

canShift(termnumber) → boolean

Check if the given term would be able to be shifted (optionally after some reductions) on this stack. This can be useful for external tokenizers that want to make sure they only provide a given token when it applies.

readonly ruleStart: number

Find the start position of the rule that is currently being parsed.

startOf(types: readonly number[]) → number

Find the start position of the innermost instance of any of the given term types, or return -1 when none of them are found.

Note: this is only reliable when there is at least some state that unambiguously matches the given rule on the stack. I.e. if you have a grammar like this, where the difference between a and b is only apparent at the third token:

a { b | c }
b { "x" "y" "x" }
c { "x" "y" "z" }

Then a parse state after "x" will not reliably tell you that b is on the stack. You can pass [b, c] to reliably check for either of those two rules (assuming that a isn't part of some rule that includes other things starting with "x").

readonly parser: Parser

Get the parser used by this stack.

dialectEnabled(dialectIDnumber) → boolean

Test whether a given dialect (by numeric ID, as exported from the terms file) is enabled.


interface InputStream

This is the interface the parser uses to access the document. It exposes a sequence of UTF16 code units. Most access will be sequential, so implementations can optimize for that.

length: number

The end of the stream.

get(posnumber) → number

Get the code unit at the given position. Will return -1 when asked for a point below 0 or beyond the end of the stream

read(fromnumber, tonumber) → string

Read part of the stream as a string

clip(atnumber) → InputStream

Return a new InputStream over the same data, but with a lower length. Used, for example, when nesting grammars to give the inner grammar a narrower view of the input.

class Token

Tokenizers write the tokens they read into instances of this class.

start: number

The start of the token. This is set by the parser, and should not be mutated by the tokenizer.

value: number

This starts at -1, and should be updated to a term id when a matching token is found.

end: number

When setting .value, you should also set .end to the end position of the token. (You'll usually want to use the accept method.)

accept(valuenumber, endnumber)

Accept a token, setting value and end to the given values.

class ExternalTokenizer

Exports that are used for @external tokens in the grammar should export an instance of this class.

new ExternalTokenizer(tokenfn(inputInputStream, tokenToken, stackStack), options⁠?: Object = {})

Create a tokenizer. The first argument is the function that, given an input stream and a token object, fills the token object if it recognizes a token. token.start should be used as the start position to scan from.

contextual⁠?: boolean

When set to true, mark this tokenizer as depending on the current parse stack, which prevents its result from being cached between parser actions at the same positions.

fallback⁠?: boolean

By defaults, when a tokenizer returns a token, that prevents tokenizers with lower precedence from even running. When fallback is true, the tokenizer is allowed to run when a previous tokenizer returned a token that didn't match any of the current state's actions.

extend⁠?: boolean

When set to true, tokenizing will not stop after this tokenizer has produced a token. (But it will still fail to reach this one if a higher-precedence tokenizer produced a token.)


These come from lezer-tree, but are exported from this module as well for convenience.

re-export Tree
re-export Subtree
re-export NodeType
re-export NodeGroup
re-export NodeProp

tree module


class Tree extends Subtree

A piece of syntax tree. There are two ways to approach these trees: the way they are actually stored in memory, and the convenient way.

Syntax trees are stored as a tree of Tree and TreeBuffer objects. By packing detail information into TreeBuffer leaf nodes, the representation is made a lot more memory-efficient.

However, when you want to actually work with tree nodes, this representation is very awkward, so most client code will want to use the Subtree interface instead, which provides a view on some part of this data structure, and can be used (through resolve, for example) to zoom in on any single node.

new Tree(typeNodeType, children: readonly (Tree | TreeBuffer)[], positions: readonly number[], lengthnumber)

Construct a new tree. You usually want to go through instead.

readonly children: readonly (Tree | TreeBuffer)[]

The tree's child nodes. Children small enough to fit in a TreeBuffer will be represented as such, other children can be further Tree instances with their own internal structure.

readonly positions: readonly number[]

The positions (offsets relative to the start of this tree) of the children.

readonly length: number

The total length of this tree

applyChanges(changes: readonly ChangedRange[]) → Tree

Apply a set of edits to a tree, removing all nodes that were touched by the edits, and moving remaining nodes so that their positions are updated for insertions/deletions before them. This is likely to destroy a lot of the structure of the tree, and mostly useful for extracting the nodes that can be reused in a subsequent incremental re-parse.

cut(atnumber) → Tree

Take the part of the tree up to the given position.

append(otherTree) → Tree

Append another tree to this tree. other must have empty space big enough to fit this tree at its start.

balance(maxBufferLength⁠?: number = DefaultBufferLength) → Tree

Balance the direct children of this tree.

static empty: Tree

The empty tree

static build(dataBuildData) → Tree

Build a tree from a postfix-ordered buffer of node information, or a cursor over such a buffer.

type BuildData

Options passed to

buffer: BufferCursor | readonly number[]

The buffer or buffer cursor to read the node data from.

When this is an array, it should contain four values for every node in the tree.

  • The first holds the node's type, as a node ID pointing into the given NodeGroup.
  • The second holds the node's start offset.
  • The third the end offset.
  • The fourth the amount of space taken up in the array by this node and its children. Since there's four values per node, this is the total number of nodes inside this node (children and transitive children) plus one for the node itself, times four.

Parent nodes should appear after child nodes in the array. As an example, a node of type 10 spanning positions 0 to 4, with two children, of type 11 and 12, might look like this:

[11, 0, 1, 4, 12, 2, 4, 4, 10, 0, 4, 12]
group: NodeGroup

The node types to use.

topID⁠?: number

The id of the top node type, if any.

maxBufferLength⁠?: number

The maximum buffer length to use. Defaults to DefaultBufferLength.

reused⁠?: (Tree | TreeBuffer)[]

An optional set of reused nodes that the buffer can refer to.

minRepeatType⁠?: number

The first node type that indicates repeat constructs in this grammar.

abstract class Subtree

A subtree is a representation of part of the syntax tree. It may either be the tree root, or a tagged node.

abstract parent: Subtree | null

The subtree's parent. Will be null for the root node

abstract type: NodeType

The node's type

readonly name: string
abstract start: number

The start source offset of this subtree

abstract end: number

The end source offset

readonly depth: number

The depth (number of parent nodes) of this subtree

readonly root: Tree

The root of the tree that this subtree is part of

abstract iterate<T = any>(argsObject) → undefined | T

Iterate over all nodes in this subtree. Will iterate through the tree in, calling args.enter for each node it enters and, if given, args.leave when it leaves a node.

enter(typeNodeType, startnumber, endnumber) → undefined | false | T

The function called when entering a node. It is given a node's type, start position, and end position, and can return...

  • undefined to proceed iterating as normal.

  • false to not further iterate this node, but continue iterating nodes after it.

  • Any other value to immediately stop iteration and return that value from the iterate method.

leave⁠?: fn(typeNodeType, startnumber, endnumber)

The function to be called when leaving a node.

from⁠?: number

The position in the tree to start iterating. All nodes that overlap with this position (including those that start/end directly at it) are included in the iteration. Defaults to the start of the subtree.

to⁠?: number

The position in the tree to iterate towards. May be less than from to perform a reverse iteration. Defaults to the end of the subtree.

resolve(posnumber, side⁠?: -1 | 0 | 1 = 0) → Subtree

Find the node at a given position. By default, this will return the lowest-depth subtree that covers the position from both sides, meaning that nodes starting or ending at the position aren't entered. You can pass a side of -1 to enter nodes that end at the position, or 1 to enter nodes that start there.

abstract childBefore(posnumber) → null | Subtree

Find the child tree before the given position, if any.

abstract childAfter(posnumber) → null | Subtree

Find the child tree after the given position, if any.

readonly firstChild: null | Subtree

Get the first child of this subtree.

readonly lastChild: null | Subtree

Find the last child of this subtree.

interface ChangedRange

The unchanged method expects changed ranges in this format.

fromA: number

The start of the change in the start document

toA: number

The end of the change in the start document

fromB: number

The start of the replacement in the new document

toB: number

The end of the replacement in the new document

Node types

class NodeType

Each node in a syntax tree has a node type associated with it.

readonly name: string

The name of the node type. Not necessarily unique, but if the grammar was written properly, different node types with the same name within a node group should play the same semantic role.

readonly id: number

The id of this node in its group. Corresponds to the term ids used in the parser.

prop<T>(propNodeProp<T>) → undefined | T

Retrieves a node prop for this type. Will return undefined if the prop isn't present on this node.

static none: NodeType

An empty dummy node type to use when no actual type is available.

static match<T>(mapObject<T>) → fn(nodeNodeType) → undefined | T

Create a function from node types to arbitrary values by specifying an object whose property names are node names. Often useful with NodeProp.add. You can put multiple node names, separated by spaces, in a single property name to map multiple node names to a single value.

class NodeGroup

A node group holds a collection of node types. It is used to compactly represent trees by storing their type ids, rather than a full pointer to the type object, in a number array. Each parser has a node group, and tree buffers can only store collections of nodes from the same group. A group can have a maximum of 2**16 (65536) node types in it, so that the ids fit into 16-bit typed array slots.

new NodeGroup(types: readonly NodeType[])

Create a group with the given types. The id property of each type should correspond to its position within the array.

readonly types: readonly NodeType[]

The node types in this group, by id.

extend(...propsNodePropSource[]) → NodeGroup

Create a copy of this group with some node properties added. The arguments to this method should be created with NodeProp.add.

class NodeProp<T>

Each node type can have metadata associated with it in props. Instances of this class represent prop names.

new NodeProp({deserialize⁠?: fn(strstring) → T} = {})

Create a new node prop type. You can optionally pass a deserialize function.

deserialize(strstring) → T

A method that deserializes a value of this prop from a string. Can be used to allow a prop to be directly written in a grammar file. Defaults to raising an error.

set(propObjany[], valueT) → Object

Store a value for this prop in the given object. This can be useful when building up a prop object to pass to the NodeType constructor. Returns its first argument.

add(matchObject<T> | fn(typeNodeType) → undefined | T) → NodePropSource

This is meant to be used with NodeGroup.extend or Parser.withProps to compute prop values for each node type in the group. Takes a match object or function that returns undefined if the node type doesn't get this prop, and the prop's value if it does.

static string() → NodeProp<string>

Create a string-valued node prop whose deserialize function is the identity function.

static number() → NodeProp<number>

Create a number-valued node prop whose deserialize function is just Number.

static flag() → NodeProp<boolean>

Creates a boolean-valued node prop whose deserialize function returns true for any input.

static error: NodeProp<boolean>

The special node type that the parser uses to represent parse errors has this flag set. (You shouldn't use it for custom nodes that represent erroneous content.)

static skipped: NodeProp<boolean>

Nodes that were produced by skipped expressions (such as comments) have this prop set to true.

static closedBy: NodeProp<readonly string[]>

Prop that is used to describe matching delimiters. For opening delimiters, this holds an array of node names (written as a space-separated string when declaring this prop in a grammar) for the node types of closing delimiters that match it.

static openedBy: NodeProp<readonly string[]>

The inverse of openedBy. This is attached to closing delimiters, holding an array of node names of types of matching opening delimiters.

static top: NodeProp<boolean>

Indicates that this node indicates a top level document.

class NodePropSource

Type returned by NodeProp.add. Describes the way a prop should be added to each node type in a node group.


class TreeBuffer

Tree buffers contain (type, start, end, endIndex) quads for each node. In such a buffer, nodes are stored in prefix order (parents before children, with the endIndex of the parent indicating which children belong to it)

readonly length: number
readonly type: NodeType
iterate<T = any>(Object) → undefined | T
enter(typeNodeType, startnumber, endnumber) → undefined | false | T

The function called when entering a node. It is given a node's type, start position, and end position, and can return...

  • undefined to proceed iterating as normal.

  • false to not further iterate this node, but continue iterating nodes after it.

  • Any other value to immediately stop iteration and return that value from the iterate method.

leave⁠?: fn(typeNodeType, startnumber, endnumber)

The function to be called when leaving a node.

from⁠?: number

The position in the tree to start iterating. All nodes that overlap with this position (including those that start/end directly at it) are included in the iteration. Defaults to the start of the subtree.

to⁠?: number

The position in the tree to iterate towards. May be less than from to perform a reverse iteration. Defaults to the end of the subtree.

DefaultBufferLength: 1024

The default maximum length of a TreeBuffer node.

interface BufferCursor

This is used by as an abstraction for iterating over a tree buffer. A cursor initially points at the very last element in the buffer. Every time next() is called it moves on to the previous one.

pos: number

The current buffer position (four times the number of nodes remaining).

id: number

The node ID of the next node in the buffer.

start: number

The start position of the next node in the buffer.

end: number

The end position of the next node.

size: number

The size of the next node (the number of nodes inside, counting the node itself, times 4).


Moves this.pos down by 4.

fork() → BufferCursor

Create a copy of this cursor.

generator module

type BuildOptions

fileName⁠?: string

The name of the grammar file

warn⁠?: fn(messagestring)

A function that should be called with warnings. The default is to call console.warn.

includeNames⁠?: boolean

Whether to include term names in the output file. Defaults to false.

moduleStyle⁠?: string

Determines the module system used by the output file. Can be either "cjs" (CommonJS) or "es" (ES2015 module), defaults to "es".

exportName⁠?: string

The name of the export that holds the parser in the output file. Defaults to "parser".

externalTokenizer⁠?: fn(namestring, termsObject<number>) → ExternalTokenizer

When calling buildParser, this can be used to provide placeholders for external tokenizers.

externalSpecializer⁠?: fn(namestring, termsObject<number>) → fn(valuestring, stackStack) → number

Provide placeholders for external specializers when using buildParser.

nestedGrammar⁠?: fn(namestring, termsObject<number>) → NestedGrammar

Only relevant when using buildParser. Provides placeholders for nested grammars.

externalProp⁠?: fn(namestring) → NodeProp<any>

If given, will be used to initialize external props in the parser returned by buildParser.

buildParserFile(textstring, options⁠?: BuildOptions = {}) → {parser: string, terms: string}

Build the code that represents the parser tables for a given grammar description. The parser property in the return value holds the main file that exports the Parser instance. The terms property holds a declaration file that defines constants for all of the named terms in grammar, holding their ids as value. This is useful when external code, such as a tokenizer, needs to be able to use these ids. It is recommended to run a tree-shaking bundler when importing this file, since you usually only need a handful of the many terms in your code.

buildParser(textstring, options⁠?: BuildOptions = {}) → Parser

Build an in-memory parser instance for a given grammar. This is mostly useful for testing. If your grammar uses external tokenizers or nested grammars, you'll have to provide the externalTokenizer and/or nestedGrammar options for the returned parser to be able to parse anything.

class GenError extends Error

The type of error raised when the parser generator finds an issue.