Reference Manual

The lezer system consists of three modules, each distributed as a separate package on npm.

lezer module

Parsing

class Parser

A parser holds the parse tables for a given grammar, as generated by lezer-generator.

nodeSet: NodeSet

A node set with the node types used by this parser.

topRules: Object<[number, number]>

Maps top rule names to [state ID, top term ID] pairs.

parse(
startPos⁠?: number = 0,
) → Tree

Parse a given string or stream.

startParse(
startPos⁠?: number = 0,
) → PartialParse

Start an incremental parse.

configure(configParserConfig) → Parser

Configure the parser. Returns a new parser instance that has the given settings modified. Settings not provided in config are kept from the original parser.

getName(termnumber) → string

Returns the name associated with a given term. This will only work for all terms when the parser was generated with the --names option. By default, only the names of tagged terms are stored.

hasNested: boolean

Tells you whether this grammar has any nested grammars.

interface ParserConfig

Configuration options to pass to a parser.

props⁠?: readonly NodePropSource[]

Node props to add to the parser's node set.

top⁠?: string

The name of the @top declaration to parse from. If not specified, the first @top declaration is used.

dialect⁠?: string

A space-separated string of dialects to enable.

nested⁠?: Object<NestedParser>

The nested grammars to use. This can be used to, for example, swap in a different language for a nested grammar or fill in a nested grammar that was left blank by the original grammar.

tokenizers⁠?: {fromExternalTokenizer, toExternalTokenizer}[]

Replace the given external tokenizers with new ones.

strict⁠?: boolean

When true, the parser will raise an exception, rather than run its error-recovery strategies, when the input doesn't match the grammar.

bufferLength⁠?: number

The maximum length of the TreeBuffers generated in the output tree. Defaults to 1024.

type NestedParser = NestedParserSpec | fn(inputInput, stackStack) → NestedParserSpec | null

This type is used to specify a nested parser. It may directly be a nested parse spec, or a function that, given an input document and a stack, returns such a spec or null to indicate that the nested parse should not happen (and the grammar's fallback expression should be used).

type NestedParserSpec

Used to configure a nested parse.

startParse⁠?: fn() → PartialParse

The inner parser. Will be passed the input, clipped to the size of the parseable region, the start position of the inner region as startPos, and an optional array of tree fragments from a previous parse that can be reused.

When this property isn't given, the inner region is simply skipped over intead of parsed.

wrapType⁠?: NodeType | number

When given, an additional node will be wrapped around the part of the tree produced by this inner parse.

filterEnd⁠?: fn(endTokenstring) → boolean

When given, this will be called with the token that ends the inner region. It can return false to cause a given end token to be ignored.

class Stack

A parse stack. These are used internally by the parser to track parsing progress. They also provide some properties and methods that external code such as a tokenizer can use to get information about the parse state.

pos: number

The input position up to which this stack has parsed.

context: any

The stack's current context value, if any. Its type will depend on the context tracker's type parameter, or it will be null if there is no context tracker.

canShift(termnumber) → boolean

Check if the given term would be able to be shifted (optionally after some reductions) on this stack. This can be useful for external tokenizers that want to make sure they only provide a given token when it applies.

ruleStart: number

Find the start position of the rule that is currently being parsed.

startOf(typesreadonly number[], before⁠?: number) → number | null

Find the start position of an instance of any of the given term types, or return null when none of them are found.

Note: this is only reliable when there is at least some state that unambiguously matches the given rule on the stack. I.e. if you have a grammar like this, where the difference between a and b is only apparent at the third token:

a { b | c }
b { "x" "y" "x" }
c { "x" "y" "z" }

Then a parse state after "x" will not reliably tell you that b is on the stack. You can pass [b, c] to reliably check for either of those two rules (assuming that a isn't part of some rule that includes other things starting with "x").

When before is given, this keeps scanning up the stack until it finds a match that starts before that position.

Note that you have to be careful when using this in tokenizers, since it's relatively easy to introduce data dependencies that break incremental parsing by using this method.

parser: Parser

Get the parser used by this stack.

dialectEnabled(dialectIDnumber) → boolean

Test whether a given dialect (by numeric ID, as exported from the terms file) is enabled.

Tokenizers

class Token

Tokenizers write the tokens they read into instances of this class.

start: number

The start of the token. This is set by the parser, and should not be mutated by the tokenizer.

value: number

This starts at -1, and should be updated to a term id when a matching token is found.

end: number

When setting .value, you should also set .end to the end position of the token. (You'll usually want to use the accept method.)

accept(valuenumber, endnumber)

Accept a token, setting value and end to the given values.

class ExternalTokenizer

Exports that are used for @external tokens in the grammar should export an instance of this class.

new ExternalTokenizer(
options⁠?: Object = {}
)

Create a tokenizer. The first argument is the function that, given an input stream and a token object, fills the token object if it recognizes a token. token.start should be used as the start position to scan from.

options
contextual⁠?: boolean

When set to true, mark this tokenizer as depending on the current parse stack, which prevents its result from being cached between parser actions at the same positions.

fallback⁠?: boolean

By defaults, when a tokenizer returns a token, that prevents tokenizers with lower precedence from even running. When fallback is true, the tokenizer is allowed to run when a previous tokenizer returned a token that didn't match any of the current state's actions.

extend⁠?: boolean

When set to true, tokenizing will not stop after this tokenizer has produced a token. (But it will still fail to reach this one if a higher-precedence tokenizer produced a token.)

class ContextTracker<T>

Context trackers are used to track stateful context (such as indentation in the Python grammar, or parent elements in the XML grammar) needed by external tokenizers. You declare them in a grammar file as @context exportName from "module".

Context values should be immutable, and can be updated (replaced) on shift or reduce actions.

new ContextTracker(specObject)

The export used in a @context declaration should be of this type.

spec
start: T

The initial value of the context.

shift⁠?: fn() → T

Update the context when the parser executes a shift action.

reduce⁠?: fn() → T

Update the context when the parser executes a reduce action.

reuse⁠?: fn() → T

Update the context when the parser reuses a node from a tree fragment.

hash(contextT) → number

Reduce a context value to a number (for cheap storage and comparison).

strict⁠?: boolean

By default, nodes can only be reused during incremental parsing if they were created in the same context as the one in which they are reused. Set this to false to disable that check.

Re-exports

These come from lezer-tree, but are exported from this module as well for convenience.

re-export Tree
re-export Input
re-export SyntaxNode
re-export TreeCursor
re-export NodeType
re-export NodeSet
re-export NodeProp

tree module

Trees

class Tree

A piece of syntax tree. There are two ways to approach these trees: the way they are actually stored in memory, and the convenient way.

Syntax trees are stored as a tree of Tree and TreeBuffer objects. By packing detail information into TreeBuffer leaf nodes, the representation is made a lot more memory-efficient.

However, when you want to actually work with tree nodes, this representation is very awkward, so most client code will want to use the TreeCursor interface instead, which provides a view on some part of this data structure, and can be used to move around to adjacent nodes.

new Tree(
childrenreadonly (Tree | TreeBuffer)[],
positionsreadonly number[],
)

Construct a new tree. You usually want to go through Tree.build instead.

type: NodeType
children: readonly (Tree | TreeBuffer)[]

The tree's child nodes. Children small enough to fit in a TreeBuffer will be represented as such, other children can be furtherTree` instances with their own internal structure.

positions: readonly number[]

The positions (offsets relative to the start of this tree) of the children.

length: number

The total length of this tree

cursor(pos⁠?: number, side⁠?: -1 | 0 | 1 = 0) → TreeCursor

Get a tree cursor rooted at this tree. When pos is given, the cursor is moved to the given position and side.

fullCursor() → TreeCursor

Get a tree cursor that, unlike regular cursors, doesn't skip anonymous nodes.

topNode: SyntaxNode

Get a syntax node object for the top of the tree.

resolve(posnumber, side⁠?: -1 | 0 | 1 = 0) → SyntaxNode

Get the syntax node at the given position. If side is -1, this will move into nodes that end at the position. If 1, it'll move into nodes that start at the position. With 0, it'll only enter nodes that cover the position from both sides.

iterate(
spec: {
enterfn(typeNodeType, fromnumber, tonumber) → false | undefined,
leave⁠?: fn(typeNodeType, fromnumber, tonumber),
from⁠?: number,
to⁠?: number
}
)

Iterate over the tree and its children, calling enter for any node that touches the from/to region (if given) before running over such a node's children, and leave (if given) when leaving the node. When enter returns false, the given node will not have its children iterated over (or leave called).

balance(
maxBufferLength⁠?: number = DefaultBufferLength
) → Tree

Balance the direct children of this tree.

static empty: Tree

The empty tree

static build(dataBuildData) → Tree

Build a tree from a postfix-ordered buffer of node information, or a cursor over such a buffer.

interface SyntaxNode

A syntax node provides an immutable pointer at a given node in a tree. When iterating over large amounts of nodes, you may want to use a mutable cursor instead, which is more efficient.

type: NodeType

The type of the node.

name: string

The name of the node (.type.name).

from: number

The start position of the node.

to: number

The end position of the node.

parent: SyntaxNode | null

The node's parent node, if any.

firstChild: SyntaxNode | null

The first child, if the node has children.

lastChild: SyntaxNode | null

The node's last child, if available.

childAfter(posnumber) → SyntaxNode | null

The first child that starts at or after pos.

childBefore(posnumber) → SyntaxNode | null

The last child that ends at or before pos.

nextSibling: SyntaxNode | null

This node's next sibling, if any.

prevSibling: SyntaxNode | null

This node's previous sibling.

cursor: TreeCursor

A tree cursor starting at this node.

resolve(posnumber, side⁠?: -1 | 0 | 1) → SyntaxNode

Find the node around, before (if side is -1), or after (side is 1) the given position. Will look in parent nodes if the position is outside this node.

getChild(
before⁠?: string | number | null,
) → SyntaxNode | null

Get the first child of the given type (which may be a node name or a group name). If before is non-null, only return children that occur somewhere after a node with that name or group. If after is non-null, only return children that occur somewhere before a node with that name or group.

getChildren(
before⁠?: string | number | null,
) → SyntaxNode[]

Like getChild, but return all matching children, not just the first.

class TreeCursor

A tree cursor object focuses on a given node in a syntax tree, and allows you to move to adjacent nodes.

type: NodeType

The node's type.

name: string

Shorthand for .type.name.

from: number

The start source offset of this node.

to: number

The end source offset.

full: boolean
firstChild() → boolean

Move the cursor to this node's first child. When this returns false, the node has no child, and the cursor has not been moved.

lastChild() → boolean

Move the cursor to this node's last child.

childAfter(posnumber) → boolean

Move the cursor to the first child that starts at or after pos.

childBefore(posnumber) → boolean

Move to the last child that ends at or before pos.

parent() → boolean

Move the node's parent node, if this isn't the top node.

nextSibling() → boolean

Move to this node's next sibling, if any.

prevSibling() → boolean

Move to this node's previous sibling, if any.

next() → boolean

Move to the next node in a pre-order traversal, going from a node to its first child or, if the current node is empty, its next sibling or the next sibling of the first parent node that has one.

prev() → boolean

Move to the next node in a last-to-first pre-order traveral. A node is followed by ist last child or, if it has none, its previous sibling or the previous sibling of the first parent node that has one.

moveTo(posnumber, side⁠?: -1 | 0 | 1 = 0) → TreeCursor

Move the cursor to the innermost node that covers pos. If side is -1, it will enter nodes that end at pos. If it is 1, it will enter nodes that start at pos.

node: SyntaxNode

Get a syntax node at the cursor's current position.

tree: Tree | null

Get the tree that represents the current node, if any. Will return null when the node is in a tree buffer.

Node types

class NodeType

Each node in a syntax tree has a node type associated with it.

name: string

The name of the node type. Not necessarily unique, but if the grammar was written properly, different node types with the same name within a node set should play the same semantic role.

id: number

The id of this node in its set. Corresponds to the term ids used in the parser.

prop<T>(propNodeProp<T>) → T | undefined

Retrieves a node prop for this type. Will return undefined if the prop isn't present on this node.

isTop: boolean

True when this is the top node of a grammar.

isSkipped: boolean

True when this node is produced by a skip rule.

isError: boolean

Indicates whether this is an error node.

isAnonymous: boolean

When true, this node type doesn't correspond to a user-declared named node, for example because it is used to cache repetition.

is(namestring | number) → boolean

Returns true when this node's name or one of its groups matches the given string.

static define(specObject) → NodeType
spec
id: number

The ID of the node type. When this type is used in a set, the ID must correspond to its index in the type array.

name⁠?: string

The name of the node type. Leave empty to define an anonymous node.

props⁠?: readonly (NodePropSource | [NodeProp<any>, any])[]

Node props to assign to the type. The value given for any given prop should correspond to the prop's type.

top⁠?: boolean

Whether is is a top node.

error⁠?: boolean

Whether this node counts as an error node.

skipped⁠?: boolean

Whether this node is a skipped node.

static none: NodeType

An empty dummy node type to use when no actual type is available.

static match<T>(mapObject<T>) → fn(nodeNodeType) → T | undefined

Create a function from node types to arbitrary values by specifying an object whose property names are node or group names. Often useful with NodeProp.add. You can put multiple names, separated by spaces, in a single property name to map multiple node names to a single value.

class NodeSet

A node set holds a collection of node types. It is used to compactly represent trees by storing their type ids, rather than a full pointer to the type object, in a number array. Each parser has a node set, and tree buffers can only store collections of nodes from the same set. A set can have a maximum of 2**16 (65536) node types in it, so that the ids fit into 16-bit typed array slots.

new NodeSet(typesreadonly NodeType[])

Create a set with the given types. The id property of each type should correspond to its position within the array.

types: readonly NodeType[]

The node types in this set, by id.

extend(...propsNodePropSource[]) → NodeSet

Create a copy of this set with some node properties added. The arguments to this method should be created with NodeProp.add.

class NodeProp<T>

Each node type can have metadata associated with it in props. Instances of this class represent prop names.

new NodeProp(
{deserialize⁠?: fn(strstring) → T} = {}
)

Create a new node prop type. You can optionally pass a deserialize function.

deserialize(strstring) → T

A method that deserializes a value of this prop from a string. Can be used to allow a prop to be directly written in a grammar file. Defaults to raising an error.

set(propObjany[], valueT) → any[]

Store a value for this prop in the given object. This can be useful when building up a prop object to pass to the NodeType constructor. Returns its first argument.

add(
matchObject<T> | fn(typeNodeType) → T | undefined
) → NodePropSource

This is meant to be used with NodeSet.extend or Parser.withProps to compute prop values for each node type in the set. Takes a match object or function that returns undefined if the node type doesn't get this prop, and the prop's value if it does.

static string() → NodeProp<string>

Create a string-valued node prop whose deserialize function is the identity function.

static number() → NodeProp<number>

Create a number-valued node prop whose deserialize function is just Number.

static flag() → NodeProp<boolean>

Creates a boolean-valued node prop whose deserialize function returns true for any input.

static closedBy: NodeProp<readonly string[]>

Prop that is used to describe matching delimiters. For opening delimiters, this holds an array of node names (written as a space-separated string when declaring this prop in a grammar) for the node types of closing delimiters that match it.

static openedBy: NodeProp<readonly string[]>

The inverse of openedBy. This is attached to closing delimiters, holding an array of node names of types of matching opening delimiters.

static group: NodeProp<readonly string[]>

Used to assign node types to groups (for example, all node types that represent an expression could be tagged with an "Expression" group).

type NodePropSource = fn(typeNodeType) → [NodeProp<any>, any] | null

Type returned by NodeProp.add. Describes the way a prop should be added to each node type in a node set.

Buffers

class TreeBuffer

Tree buffers contain (type, start, end, endIndex) quads for each node. In such a buffer, nodes are stored in prefix order (parents before children, with the endIndex of the parent indicating which children belong to it)

length: number
type: NodeType
DefaultBufferLength: 1024

The default maximum length of a TreeBuffer node.

interface BufferCursor

This is used by Tree.build as an abstraction for iterating over a tree buffer. A cursor initially points at the very last element in the buffer. Every time next() is called it moves on to the previous one.

pos: number

The current buffer position (four times the number of nodes remaining).

id: number

The node ID of the next node in the buffer.

start: number

The start position of the next node in the buffer.

end: number

The end position of the next node.

size: number

The size of the next node (the number of nodes inside, counting the node itself, times 4).

next()

Moves this.pos down by 4.

fork() → BufferCursor

Create a copy of this cursor.

Incremental parsing

interface ChangedRange

The TreeFragment.applyChanges method expects changed ranges in this format.

fromA: number

The start of the change in the start document

toA: number

The end of the change in the start document

fromB: number

The start of the replacement in the new document

toB: number

The end of the replacement in the new document

class TreeFragment

Tree fragments are used during incremental parsing to track parts of old trees that can be reused in a new parse. An array of fragments is used to track regions of an old tree whose nodes might be reused in new parses. Use the static applyChanges method to update fragments for document changes.

new TreeFragment()
from: number

The start of the unchanged range pointed to by this fragment. This refers to an offset in the updated document (as opposed to the original tree).

to: number

The end of the unchanged range.

tree: Tree

The tree that this fragment is based on.

offset: number

The offset between the fragment's tree and the document that this fragment can be used against. Add this when going from document to tree positions, subtract it to go from tree to document positions.

openStart: boolean
openEnd: boolean
static applyChanges(
fragmentsreadonly TreeFragment[],
changesreadonly ChangedRange[],
minGap⁠?: number = 128
) → readonly TreeFragment[]

Apply a set of edits to an array of fragments, removing or splitting fragments as necessary to remove edited ranges, and adjusting offsets for fragments that moved.

static addTree(
fragments⁠?: readonly TreeFragment[] = [],
partial⁠?: boolean = false
) → TreeFragment[]

Create a set of fragments from a freshly parsed tree, or update an existing set of fragments by replacing the ones that overlap with a tree with content from the new tree. When partial is true, the parse is treated as incomplete, and the token at its end is not included in safeTo.

interface PartialParse

Interface used to represent an in-progress parse, which can be moved forward piece-by-piece.

advance() → Tree | null

Advance the parse state by some amount.

pos: number

The current parse position.

forceFinish() → Tree

Get the currently parsed content as a tree, even though the parse hasn't finished yet.

interface Input

This is the interface the parser uses to access the document. It exposes a sequence of UTF16 code units. Most (but not all) access, especially through get, will be sequential, so implementations can optimize for that.

length: number

The end of the stream.

get(posnumber) → number

Get the code unit at the given position. Will return -1 when asked for a point below 0 or beyond the end of the stream.

lineAfter(posnumber) → string

Returns the string between pos and the next newline character or the end of the document. Not used by the built-in tokenizers, but can be useful in custom tokenizers or completely custom parsers.

read(fromnumber, tonumber) → string

Read part of the stream as a string

clip(atnumber) → Input

Return a new Input over the same data, but with a lower length. Used, for example, when nesting grammars to give the inner grammar a narrower view of the input.

stringInput(inputstring) → Input

interface ParseContext

A parse context is an object providing additional information to the parser. It is passed through to nested parsers.

fragments⁠?: readonly TreeFragment[]

A set of fragments from a previous parse to be used for incremental parsing. These should be aligned with the current document (through a call to TreeFragment.applyChanges) if any changes were made since they were produced. The parser will try to reuse nodes from the fragments in the new parse, greatly speeding up the parse when it can do so for most of the document.

generator module

re-export BuildOptions
re-export buildParserFile
re-export buildParser
re-export GenError