Reference Manual

The lezer system consists of three modules, each distributed as a separate package on npm.

lezer module

Parsing

interface ParseOptions

Options that can be passed to control parsing.

cache⁠?: Tree

Passing a cached tree is used for incremental parsing. This should be a tree whose content is aligned with the current document (though a call to Tree.unchanged) if any changes were made since it was produced. The parser will try to reuse nodes from this tree in the new parse, greatly speeding up the parse when it can reuse nodes for most of the document.

strict⁠?: boolean

When true, the parser will raise an exception, rather than run its error-recovery strategies, when the input doesn't match the grammar.

bufferLength⁠?: number

The maximum length of the TreeBuffers generated in the output tree. Defaults to 1024.

class Parser

A parser holds the parse tables for a given grammar, as generated by lezer-generator.

readonly tags: readonly Tag[]

A TagMap mapping the node types in this grammar to their tag names.

parse(inputInputStream | string, options⁠?: ParseOptions) → Tree

Parse a given string or stream.

startParse(inputInputStream, options⁠?: ParseOptions) → ParseContext

Create a ParseContext.

isSkipped(termnumber) → boolean

Tells you whether a given term is part of the skip rules for the grammar.

withNested(specObject<NestedGrammar>) → Parser

Create a new Parser instance with different values for (some of) the nested grammars. This can be used to, for example, swap in a different language for a nested grammar or fill in a nested grammar that was left blank by the original grammar.

getName(termnumber) → string

Returns the name associated with a given term. This will only work for all terms when the parser was generated with the --names option. By default, only the names of tagged terms are stored.

class ParseContext

A parse context can be used for step-by-step parsing. After creating it, you repeatedly call .advance() until it returns a tree to indicate it has reached the end of the parse.

advance() → null | Tree

Execute one parse step. This picks the parse stack that's currently the least far along, and does the next thing that can be done with it. This may be: - Add a cached node, if a matching one is found. - Enter a nested grammar. - Perform all shift or reduce actions that match the current token (if there are more than one, this will split the stack) - Finish the parse When the parse is finished, this will return a syntax tree. When not, it returns null.

readonly pos: number

The position to which the parse has advanced.

forceFinish() → Tree

Force the parse to finish, generating a tree containing the nodes parsed so far.

type NestedGrammar = null | Parser | fn(inputInputStream, stackStack) → NestedGrammarSpec

Nested grammar values are associated with nesting positions in the grammar. If they are null, the nested region is simply skipped over. If they hold a parser object, that parser is used to parse the region. To implement dynamic behavior, the value may also be a function which returns a description of the way the region should be parsed.

interface NestedGrammarSpec

An object indicating how to proceed with a nested parse.

parser⁠?: Parser

When given, this is used to provide a parser that should be used to parse the content.

stay⁠?: boolean

This being true means that the outer grammar should use the fallback expression provided for the nesting to parse the content.

parseNode⁠?: fn(inputInputStream) → Tree

Alternatively, parseNode may hold a function which will be made responsible for parsing the region.

filterEnd⁠?: fn(endTokenstring) → boolean

When a filterEnd property is present, that should hold a function that determines whether a given end token (which matches the end token specified in the grammar) should be used (true) or ignored (false). This is mostly useful for implementing things like XML closing tag matching.

class Stack

A parse stack. These are used internally by the parser to track parsing progress. They also provide some properties and methods that external code such as a tokenizer can use to get information about the parse state.

pos: number
canShift(termnumber) → boolean

Check if the given term would be able to be shifted (optionally after some reductions) on this stack. This can be useful for external tokenizers that want to make sure they only provide a given token when it applies.

readonly ruleStart: number

Find the start position of the rule that is currently being parsed.

Tokenizers

interface InputStream

This is the interface the parser uses to access the document. It exposes a sequence of UTF16 code points. Most access will be sequential, so implementations can optimize for that.

length: number

The end of the stream.

get(posnumber) → number

Get the code point at the given position. Will return -1 when asked for a point below 0 or beyond the end of the stream

read(fromnumber, tonumber) → string

Read part of the stream as a string

clip(atnumber) → InputStream

Return a new InputStream over the same data, but with a lower length. Used, for example, when nesting grammars to give the inner grammar a narrower view of the input.

class Token

Tokenizers write the tokens they read into instances of this class.

start: number

The start of the token. This is set by the parser, and should not be mutated by the tokenizer.

value: number

This starts at -1, and should be updated to a term id when a matching token is found.

end: number

When setting .value, you should also set .end to the end position of the token. (You'll usually want to use the accept method.)

accept(valuenumber, endnumber)

Accept a token, setting value and end to the given values.

class ExternalTokenizer

new ExternalTokenizer(tokenfn(inputInputStream, tokenToken, stackStack), options⁠?: {contextual⁠?: boolean} = {})
contextual: boolean
readonly token(inputInputStream, tokenToken, stackStack)

Re-exports

These come from lezer-tree, but are exported from this module as well for convenience.

class Tree extends Subtree

new Tree(childrenTree | TreeBuffer[], positionsnumber[], lengthnumber, tags: readonly Tag[], type⁠?: number)
static empty: Tree
static build(bufferBufferCursor | readonly number[], tags: readonly Tag[], maxBufferLength⁠?: number, reused⁠?: Tree[]) → Tree
readonly children: Tree | TreeBuffer[]
readonly positions: number[]
readonly length: number
readonly tags: readonly Tag[]
readonly type: number
parent: null
readonly start: number
readonly end: number
readonly tag: Tag
isPartOf(tags: readonly Tag[]) → boolean
toString() → string
applyChanges(changes: readonly ChangedRange[]) → Tree
cut(atnumber) → Tree
iterate<T = any>(fromnumber, tonumber, enterfn(tagTag, startnumber, endnumber) → undefined | false | T, leave⁠?: fn(tagTag, startnumber, endnumber)) → undefined | T
iterInner<T>(fromnumber, tonumber, offsetnumber, iterIteration<T>)
resolveAt(posnumber, side⁠?: -1 | 0 | 1) → Subtree
childBefore(posnumber) → null | Subtree
childAfter(posnumber) → null | Subtree
findChild(posnumber, sidenumber, startnumber, parentSubtree) → null | Subtree
resolveInner(posnumber, startnumber, parentSubtree) → Subtree
append(otherTree) → Tree
balance(maxBufferLength⁠?: number) → Tree

abstract class Subtree

abstract parent: Subtree | null
abstract tag: Tag
abstract start: number
abstract end: number
readonly depth: number
readonly root: Tree
abstract toString() → string
abstract iterate<T = any>(fromnumber, tonumber, enterfn(tagTag, startnumber, endnumber) → undefined | false | T, leave⁠?: fn(tagTag, startnumber, endnumber)) → undefined | T
resolve(posnumber, side⁠?: -1 | 0 | 1) → Subtree
abstract resolveAt(posnumber) → Subtree
abstract childBefore(posnumber) → null | Subtree
abstract childAfter(posnumber) → null | Subtree
readonly firstChild: Subtree | null
readonly lastChild: Subtree | null

class Tag

new Tag(tagstring)
static empty: Tag
readonly tag: string
has(namestring, value⁠?: string) → boolean
matches(tagTag) → number

tree module

Trees

class Tree extends Subtree

A piece of syntax tree. There are two ways to approach these trees: the way they are actually stored in memory, and the convenient way. Syntax trees are stored as a tree of Tree and TreeBuffer objects. By packing detail information into TreeBuffer leaf nodes, the representation is made a lot more memory-efficient. However, when you want to actually work with tree nodes, this representation is very awkward, so most client code will want to use the Subtree interface instead, which provides a view on some part of this data structure, and can be used (through resolve, for example) to zoom in on any single node.

static empty: Tree

The empty tree

static build(bufferBufferCursor | readonly number[], tags: readonly Tag[], maxBufferLength⁠?: number = DefaultBufferLength, reused⁠?: Tree[] = []) → Tree

Build a tree from a postfix-ordered buffer of node information, or a cursor over such a buffer.

readonly children: Tree | TreeBuffer[]

The tree's child nodes. Children small enough to fit in a TreeBuffer will be represented as such, other children can be further Tree instances with their own internal structure.

readonly positions: number[]

The positions (offsets relative to the start of this tree) of the children.

readonly length: number

The total length of this tree.

isPartOf(tags: readonly Tag[]) → boolean

Check whether this tree's tag belongs to a given set of tags. Can be used to determine that a node belongs to the grammar defined by a specific parser.

applyChanges(changes: readonly ChangedRange[]) → Tree

Apply a set of edits to a tree, removing all nodes that were touched by the edits, and moving remaining nodes so that their positions are updated for insertions/deletions before them. This is likely to destroy a lot of the structure of the tree, and mostly useful for extracting the nodes that can be reused in a subsequent incremental re-parse.

cut(atnumber) → Tree

Take the part of the tree up to the given position.

append(otherTree) → Tree

Append another tree to this tree. other must have empty space big enough to fit this tree at its start.

balance(maxBufferLength⁠?: number = DefaultBufferLength) → Tree

Balance the direct children of this tree. Should only be used on non-tagged trees.

abstract class Subtree

A subtree is a representation of part of the syntax tree. It may either be the tree root, or a tagged node.

abstract parent: Subtree | null

The subtree's parent. Will be null for the root node

abstract tag: Tag

The node's tag. Will be Tag.empty for the root

abstract start: number

The start source offset of this subtree

abstract end: number

The end source offset

readonly depth: number

The depth (number of parent nodes) of this subtree

readonly root: Tree

The root of the tree that this subtree is part of

abstract iterate<T = any>(fromnumber, tonumber, enterEnterFunc<T>, leave⁠?: LeaveFunc) → undefined | T

Iterate over all nodes in this subtree. Will iterate through the tree in, calling enter for each node it enters and, if given, leave when it leaves a node.

resolve(posnumber, side⁠?: -1 | 0 | 1 = 0) → Subtree

Find the node at a given position. By default, this will return the lowest-depth subtree that covers the position from both sides, meaning that nodes starting or ending at the position aren't entered. You can pass a side of -1 to enter nodes that end at the position, or 1 to enter nodes that start there.

abstract childBefore(posnumber) → null | Subtree

Find the child tree before the given position, if any.

abstract childAfter(posnumber) → null | Subtree

Find the child tree after the given position, if any.

readonly firstChild: null | Subtree

Get the first child of this subtree.

readonly lastChild: null | Subtree

Find the last child of this subtree.

interface ChangedRange

The unchanged method expects changed ranges in this format.

fromA: number

The start of the change in the start document

toA: number

The end of the change in the start document

fromB: number

The start of the replacement in the new document

toB: number

The end of the replacement in the new document

type EnterFunc<T> = fn<T>(tagTag, startnumber, endnumber) → undefined | false | T

Signature of the enter function passed to Subtree.iterate. It is given a node's tag, start position, and end position for every node, and can return... * undefined to proceed iterating as normal. * false to not further iterate this node, but continue iterating nodes after it. * Any other value to immediately stop iteration and return the value from the iterate method.

type LeaveFunc = fn(tagTag, startnumber, endnumber)

Signature of the leave function passed to Subtree.iterate.

Node tags

class Tag

Tags represent information about nodes. They are an ordered collection of parts (more specific ones first) written in the form boolean.literal.expression (where boolean further specifies literal, which in turn further specifies expression). A part may also have a value, written after an = sign, as in tag.selector.lang=css. Part names and values may be double quoted (using JSON string notation) when they contain non-word characters. This wrapper object pre-parses the tag for easy querying.

new Tag(tagstring)

Create a tag object from a string.

static empty: Tag

The empty tag, returned for nodes that don't have a meaningful tag.

readonly tag: string

The string that the tag is based on.

has(namestring, value⁠?: string) → boolean

Check whether this tag has a part by the given name. If value is given, this will only return true when that part also has that specific value.

matches(tagTag) → number

See whether this tag contains all the parts present in the argument tag, and, if the part has a value in the query tag, the same value in this tag. Returns a specificity score—0 means there was no match, a higher score means the query matched more specific parts of the tag.

Buffers

class TreeBuffer

Tree buffers contain (type, start, end, endIndex) quads for each node. In such a buffer, nodes are stored in prefix order (parents before children, with the endIndex of the parent indicating which children belong to it)

new TreeBuffer(bufferUint16Array, lengthnumber, tags: readonly Tag[])

Create a tree buffer

readonly buffer: Uint16Array
readonly length: number
readonly tags: readonly Tag[]
DefaultBufferLength: 1024

The default maximum length of a TreeBuffer node.

interface BufferCursor

This is used by Tree.build as an abstraction for iterating over a tree buffer. You probably won't need it.

pos: number
type: number
start: number
end: number
size: number
next()
fork() → BufferCursor

generator module

type BuildOptions

fileName⁠?: string

The name of the grammar file

warn⁠?: fn(messagestring)

A function that should be called with warnings. The default is to call console.warn.

includeNames⁠?: boolean

Whether to include term names in the output file. Defaults to false.

moduleStyle⁠?: string

Determines the module system used by the output file. Can be either "cjs" (CommonJS) or "es" (ES2015 module), defaults to "es".

exportName⁠?: string

The name of the export that holds the parser in the output file. Defaults to "parser".

externalTokenizer⁠?: fn(namestring, termsObject<number>) → ExternalTokenizer

When calling buildParser, this can be used to provide placeholders for external tokenizers.

nestedGrammar⁠?: fn(namestring, termsObject<number>) → NestedGrammar

Only relevant when using buildParser. Provides placeholders for nested grammars.

buildParserFile(textstring, options⁠?: BuildOptions = {}) → {parser: string, terms: string}

Build the code that represents the parser tables for a given grammar description. The parser property in the return value holds the main file that exports the Parser instance. The terms property holds a declaration file that defines constants for all of the named terms in grammar, holding their ids as value. This is useful when external code, such as a tokenizer, needs to be able to use these ids. It is recommended to run a tree-shaking bundler when importing this file, since you usually only need a handful of the many terms in your code.

buildParser(textstring, options⁠?: BuildOptions = {}) → Parser

Build an in-memory parser instance for a given grammar. This is mostly useful for testing. If your grammar uses external tokenizers or nested grammars, you'll have to provide the externalTokenizer and/or nestedGrammar options for the returned parser to be able to parse anything.