Reference Manual

The lezer system consists of three modules, each distributed as a separate package on npm.

lezer module

Parsing

interface ParseOptions

Options that can be passed to control parsing.

cache⁠?: Tree

Passing a cached tree is used for incremental parsing. This should be a tree whose content is aligned with the current document (though a call to Tree.unchanged) if any changes were made since it was produced. The parser will try to reuse nodes from this tree in the new parse, greatly speeding up the parse when it can reuse nodes for most of the document.

strict⁠?: boolean

When true, the parser will raise an exception, rather than run its error-recovery strategies, when the input doesn't match the grammar.

bufferLength⁠?: number

The maximum length of the TreeBuffers generated in the output tree. Defaults to 1024.

class Parser

A parser holds the parse tables for a given grammar, as generated by lezer-generator.

readonly group: NodeGroup

A TagMap mapping the node types in this grammar to their tag names.

parse(inputInputStream | string, options⁠?: ParseOptions) → Tree

Parse a given string or stream.

startParse(inputInputStream, options⁠?: ParseOptions) → ParseContext

Create a ParseContext.

withNested(specObject<NestedGrammar>) → Parser

Create a new Parser instance with different values for (some of) the nested grammars. This can be used to, for example, swap in a different language for a nested grammar or fill in a nested grammar that was left blank by the original grammar.

withProps(...propsNodePropSource[]) → Parser

Create a new Parser instance whose node types have the given props added. You should use NodeProp.add to create the arguments to this method.

getName(termnumber) → string

Returns the name associated with a given term. This will only work for all terms when the parser was generated with the --names option. By default, only the names of tagged terms are stored.

class ParseContext

A parse context can be used for step-by-step parsing. After creating it, you repeatedly call .advance() until it returns a tree to indicate it has reached the end of the parse.

advance() → null | Tree

Execute one parse step. This picks the parse stack that's currently the least far along, and does the next thing that can be done with it. This may be: - Add a cached node, if a matching one is found. - Enter a nested grammar. - Perform all shift or reduce actions that match the current token (if there are more than one, this will split the stack) - Finish the parse When the parse is finished, this will return a syntax tree. When not, it returns null.

readonly pos: number

The position to which the parse has advanced.

forceFinish() → Tree

Force the parse to finish, generating a tree containing the nodes parsed so far.

type NestedGrammar = null | Parser | fn(inputInputStream, stackStack) → NestedGrammarSpec

Nested grammar values are associated with nesting positions in the grammar. If they are null, the nested region is simply skipped over. If they hold a parser object, that parser is used to parse the region. To implement dynamic behavior, the value may also be a function which returns a description of the way the region should be parsed.

interface NestedGrammarSpec

An object indicating how to proceed with a nested parse.

parser⁠?: Parser

When given, this is used to provide a parser that should be used to parse the content.

stay⁠?: boolean

This being true means that the outer grammar should use the fallback expression provided for the nesting to parse the content.

parseNode⁠?: fn(inputInputStream, startnumber) → Tree

Alternatively, parseNode may hold a function which will be made responsible for parsing the region.

wrapType⁠?: number

An optional extra type to tag the resulting tree with.

filterEnd⁠?: fn(endTokenstring) → boolean

When a filterEnd property is present, that should hold a function that determines whether a given end token (which matches the end token specified in the grammar) should be used (true) or ignored (false). This is mostly useful for implementing things like XML closing tag matching.

class Stack

A parse stack. These are used internally by the parser to track parsing progress. They also provide some properties and methods that external code such as a tokenizer can use to get information about the parse state.

pos: number
canShift(termnumber) → boolean

Check if the given term would be able to be shifted (optionally after some reductions) on this stack. This can be useful for external tokenizers that want to make sure they only provide a given token when it applies.

readonly ruleStart: number

Find the start position of the rule that is currently being parsed.

startOf(types: readonly number[]) → number

Find the start position of the innermost instance of any of the given term types, or return -1 when none of them are found. Note: this is only reliable when there is at least some state that unambiguously matches the given rule on the stack. I.e. if you have a grammar like this, where the difference between a and b is only apparent at the third token: a { b | c } b { "x" "y" "x" } c { "x" "y" "z" } Then a parse state after "x" will not reliably tell you that b is on the stack. You can pass [b, c] to reliably check for either of those two rules (assuming that a isn't part of some rule that includes other things starting with "x").

Tokenizers

interface InputStream

This is the interface the parser uses to access the document. It exposes a sequence of UTF16 code points. Most access will be sequential, so implementations can optimize for that.

length: number

The end of the stream.

get(posnumber) → number

Get the code point at the given position. Will return -1 when asked for a point below 0 or beyond the end of the stream

read(fromnumber, tonumber) → string

Read part of the stream as a string

clip(atnumber) → InputStream

Return a new InputStream over the same data, but with a lower length. Used, for example, when nesting grammars to give the inner grammar a narrower view of the input.

class Token

Tokenizers write the tokens they read into instances of this class.

start: number

The start of the token. This is set by the parser, and should not be mutated by the tokenizer.

value: number

This starts at -1, and should be updated to a term id when a matching token is found.

end: number

When setting .value, you should also set .end to the end position of the token. (You'll usually want to use the accept method.)

accept(valuenumber, endnumber)

Accept a token, setting value and end to the given values.

class ExternalTokenizer

new ExternalTokenizer(tokenfn(inputInputStream, tokenToken, stackStack), options⁠?: {contextual⁠?: boolean} = {})
contextual: boolean
readonly token(inputInputStream, tokenToken, stackStack)

Re-exports

These come from lezer-tree, but are exported from this module as well for convenience.

re-export Tree
re-export Subtree
re-export NodeType
re-export NodeGroup
re-export NodeProp

tree module

Trees

class Tree extends Subtree

A piece of syntax tree. There are two ways to approach these trees: the way they are actually stored in memory, and the convenient way. Syntax trees are stored as a tree of Tree and TreeBuffer objects. By packing detail information into TreeBuffer leaf nodes, the representation is made a lot more memory-efficient. However, when you want to actually work with tree nodes, this representation is very awkward, so most client code will want to use the Subtree interface instead, which provides a view on some part of this data structure, and can be used (through resolve, for example) to zoom in on any single node.

static empty: Tree

The empty tree

static build(bufferBufferCursor | readonly number[], groupNodeGroup, topID⁠?: number = 0, maxBufferLength⁠?: number = DefaultBufferLength, reused⁠?: Tree[] = []) → Tree

Build a tree from a postfix-ordered buffer of node information, or a cursor over such a buffer.

readonly children: readonly Tree | TreeBuffer[]

The tree's child nodes. Children small enough to fit in a TreeBuffer will be represented as such, other children can be further Tree instances with their own internal structure.

readonly positions: readonly number[]

The positions (offsets relative to the start of this tree) of the children.

applyChanges(changes: readonly ChangedRange[]) → Tree

Apply a set of edits to a tree, removing all nodes that were touched by the edits, and moving remaining nodes so that their positions are updated for insertions/deletions before them. This is likely to destroy a lot of the structure of the tree, and mostly useful for extracting the nodes that can be reused in a subsequent incremental re-parse.

cut(atnumber) → Tree

Take the part of the tree up to the given position.

append(otherTree) → Tree

Append another tree to this tree. other must have empty space big enough to fit this tree at its start.

balance(maxBufferLength⁠?: number = DefaultBufferLength) → Tree

Balance the direct children of this tree. Should only be used on non-tagged trees.

abstract class Subtree

A subtree is a representation of part of the syntax tree. It may either be the tree root, or a tagged node.

abstract parent: Subtree | null

The subtree's parent. Will be null for the root node

abstract type: NodeType

The node's type

readonly name: string
abstract start: number

The start source offset of this subtree

abstract end: number

The end source offset

readonly depth: number

The depth (number of parent nodes) of this subtree

readonly root: Tree

The root of the tree that this subtree is part of

abstract iterate<T = any>(fromnumber, tonumber, enterEnterFunc<T>, leave⁠?: LeaveFunc) → undefined | T

Iterate over all nodes in this subtree. Will iterate through the tree in, calling enter for each node it enters and, if given, leave when it leaves a node.

resolve(posnumber, side⁠?: -1 | 0 | 1 = 0) → Subtree

Find the node at a given position. By default, this will return the lowest-depth subtree that covers the position from both sides, meaning that nodes starting or ending at the position aren't entered. You can pass a side of -1 to enter nodes that end at the position, or 1 to enter nodes that start there.

abstract childBefore(posnumber) → null | Subtree

Find the child tree before the given position, if any.

abstract childAfter(posnumber) → null | Subtree

Find the child tree after the given position, if any.

readonly firstChild: null | Subtree

Get the first child of this subtree.

readonly lastChild: null | Subtree

Find the last child of this subtree.

interface ChangedRange

The unchanged method expects changed ranges in this format.

fromA: number

The start of the change in the start document

toA: number

The end of the change in the start document

fromB: number

The start of the replacement in the new document

toB: number

The end of the replacement in the new document

type EnterFunc<T> = fn<T>(typeNodeType, startnumber, endnumber) → undefined | false | T

Signature of the enter function passed to Subtree.iterate. It is given a node's tag, start position, and end position for every node, and can return... * undefined to proceed iterating as normal. * false to not further iterate this node, but continue iterating nodes after it. * Any other value to immediately stop iteration and return the value from the iterate method.

type LeaveFunc = fn(typeNodeType, startnumber, endnumber)

Signature of the leave function passed to Subtree.iterate.

Node types

class NodeType

Each node in a syntax tree has a node type associated with it.

static none: NodeType

An empty dummy node type to use when no actual type is available.

static match<T>(mapObject<T>) → fn(nodeNodeType) → undefined | T

Create a function from node types to arbitrary values by specifying an object whose property names are node names. Often useful with NodeProp.add. You can put multiple node names, separated by spaces, in a single property name to map multiple node names to a single value.

readonly name: string

The name of the node type. Not necessarily unique, but if the grammar was written properly, different node types with the same name within a node group should play the same semantic role.

readonly id: number

The id of this node in its group. Corresponds to the term ids used in the parser.

prop<T>(propNodeProp<T>) → undefined | T

Retrieves a node prop for this type. Will return undefined if the prop isn't present on this node.

class NodeGroup

A node group holds a collection of node types. It is used to compactly represent trees by storing their type ids, rather than a full pointer to the type object, in a number array. Each parser has a node group, and tree buffers can only store collections of nodes from the same group. A group can have a maximum of 2**16 (65536) node types in it, so that the ids fit into 16-bit typed array slots.

new NodeGroup(types: readonly NodeType[])

Create a group with the given types. The id property of each type should correspond to its position within the array.

readonly types: readonly NodeType[]
extend(...propsNodePropSource[]) → NodeGroup

Create a copy of this group with some node properties added. The arguments to this method should be created with NodeProp.add.

class NodeProp<T>

Each node type can have metadata associated with it in props. Instances of this class represent prop names.

new NodeProp({deserialize⁠?: fn(strstring) → T} = {})

Create a new node prop type. You can optionally pass a deserialize function.

static string() → NodeProp<string>

Create a string-valued node prop whose deserialize function is the identity function.

static flag() → NodeProp<boolean>

Creates a boolean-valued node prop whose deserialize function returns true for any input.

static error: NodeProp<boolean>

The special node type that the parser uses to represent parse errors has this flag set. (You shouldn't use it for custom nodes that represent erroneous content.)

static skipped: NodeProp<boolean>

Nodes that were produced by skipped expressions (such as comments) have this prop set to true.

static delim: NodeProp<string>

Prop that is used to describe a rule's delimiters. For example, a parenthesized expression node would set this to the string "( )" (the open and close strings separated by a space). This is added by the parser generator's @detectDelim feature, but you can also manually add them.

static lang: NodeProp<string>

The top node for a grammar usually has a lang prop set to a string identifying the grammar, to provide context for the nodes inside of it.

static repeated: NodeProp<boolean>

A prop that indicates whether a node represents a repeated expression. Abstractions like Subtree hide such nodes, so you usually won't see them, but if you directly rummage through a tree's children, you'll find repeat nodes that wrap repeated content into balanced trees.

deserialize(strstring) → T

A method that deserializes a value of this prop from a string. Can be used to allow a prop to be directly written in a grammar file. Defaults to raising an error.

set(propObjany[], valueT) → any[]

Store a value for this prop in the given object. This can be useful when building up a prop object to pass to the NodeType constructor. Returns its first argument.

add(ffn(typeNodeType) → undefined | T) → NodePropSource

This is meant to be used with NodeGroup.extend or Parser.withProps to compute prop values for each node type in the group. Takes a function that returns undefined if the node type doesn't get this prop, or the prop's value if it does.

class NodePropSource

Type returned by NodeProp.add. Describes the way a prop should be added to each node type in a node group.

Buffers

class TreeBuffer

Tree buffers contain (type, start, end, endIndex) quads for each node. In such a buffer, nodes are stored in prefix order (parents before children, with the endIndex of the parent indicating which children belong to it)

readonly buffer: Uint16Array
readonly length: number
readonly group: NodeGroup
DefaultBufferLength: 1024

The default maximum length of a TreeBuffer node.

generator module

type BuildOptions

fileName⁠?: string

The name of the grammar file

warn⁠?: fn(messagestring)

A function that should be called with warnings. The default is to call console.warn.

includeNames⁠?: boolean

Whether to include term names in the output file. Defaults to false.

moduleStyle⁠?: string

Determines the module system used by the output file. Can be either "cjs" (CommonJS) or "es" (ES2015 module), defaults to "es".

exportName⁠?: string

The name of the export that holds the parser in the output file. Defaults to "parser".

externalTokenizer⁠?: fn(namestring, termsObject<number>) → ExternalTokenizer

When calling buildParser, this can be used to provide placeholders for external tokenizers.

nestedGrammar⁠?: fn(namestring, termsObject<number>) → NestedGrammar

Only relevant when using buildParser. Provides placeholders for nested grammars.

externalProp⁠?: fn(namestring) → NodeProp<any>

If given, will be used to initialize external props in the parser returned by buildParser.

buildParserFile(textstring, options⁠?: BuildOptions = {}) → {parser: string, terms: string}

Build the code that represents the parser tables for a given grammar description. The parser property in the return value holds the main file that exports the Parser instance. The terms property holds a declaration file that defines constants for all of the named terms in grammar, holding their ids as value. This is useful when external code, such as a tokenizer, needs to be able to use these ids. It is recommended to run a tree-shaking bundler when importing this file, since you usually only need a handful of the many terms in your code.

buildParser(textstring, options⁠?: BuildOptions = {}) → Parser

Build an in-memory parser instance for a given grammar. This is mostly useful for testing. If your grammar uses external tokenizers or nested grammars, you'll have to provide the externalTokenizer and/or nestedGrammar options for the returned parser to be able to parse anything.