Reference Manual

The lezer system consists of multiple modules, each distributed as a separate package on npm.

@lezer/common module

This package provides common data structures used by all Lezer-related parsing—those related to syntax trees and the generic interface of parsers. Their main use is the LR parsers generated by the parser generator, but for example the Markdown parser implements a different parsing algorithm using the same interfaces.

Trees

Lezer syntax trees are not abstract, they just tell you which nodes were parsed where, without providing additional information about their role or relation (beyond parent-child relations). This makes them rather unsuited for some purposes, but quick to construct and cheap to store.

class Tree

A piece of syntax tree. There are two ways to approach these trees: the way they are actually stored in memory, and the convenient way.

Syntax trees are stored as a tree of Tree and TreeBuffer objects. By packing detail information into TreeBuffer leaf nodes, the representation is made a lot more memory-efficient.

However, when you want to actually work with tree nodes, this representation is very awkward, so most client code will want to use the TreeCursor or SyntaxNode interface instead, which provides a view on some part of this data structure, and can be used to move around to adjacent nodes.

new Tree(
childrenreadonly (Tree | TreeBuffer)[],
positionsreadonly number[],
props⁠?: readonly [number | NodeProp<any>, any][]
)

Construct a new tree. See also Tree.build.

props

Per-node node props to associate with this node.

type: NodeType

The type of the top node.

children: readonly (Tree | TreeBuffer)[]

This node's child nodes.

positions: readonly number[]

The positions (offsets relative to the start of this tree) of the children.

length: number

The total length of this tree

cursor(mode⁠?: IterMode = 0 as IterMode) → TreeCursor

Get a tree cursor positioned at the top of the tree. Mode can be used to control which nodes the cursor visits.

cursorAt(
side⁠?: -1 | 0 | 1 = 0,
mode⁠?: IterMode = 0 as IterMode
) → TreeCursor

Get a tree cursor pointing into this tree at the given position and side (see moveTo.

topNode: SyntaxNode

Get a syntax node object for the top of the tree.

resolve(posnumber, side⁠?: -1 | 0 | 1 = 0) → SyntaxNode

Get the syntax node at the given position. If side is -1, this will move into nodes that end at the position. If 1, it'll move into nodes that start at the position. With 0, it'll only enter nodes that cover the position from both sides.

Note that this will not enter overlays, and you often want resolveInner instead.

resolveInner(posnumber, side⁠?: -1 | 0 | 1 = 0) → SyntaxNode

Like resolve, but will enter overlaid nodes, producing a syntax node pointing into the innermost overlaid tree at the given position (with parent links going through all parent structure, including the host trees).

resolveStack(posnumber, side⁠?: -1 | 0 | 1 = 0) → NodeIterator

In some situations, it can be useful to iterate through all nodes around a position, including those in overlays that don't directly cover the position. This method gives you an iterator that will produce all nodes, from small to big, around the given position.

iterate(
spec: {
enterfn(nodeSyntaxNodeRef) → boolean | undefined,
leave⁠?: fn(nodeSyntaxNodeRef),
from⁠?: number,
to⁠?: number,
mode⁠?: IterMode
}
)

Iterate over the tree and its children, calling enter for any node that touches the from/to region (if given) before running over such a node's children, and leave (if given) when leaving the node. When enter returns false, that node will not have its children iterated over (or leave called).

prop<T>(propNodeProp<T>) → T | undefined

Get the value of the given node prop for this node. Works with both per-node and per-type props.

propValues: readonly [number | NodeProp<any>, any][]

Returns the node's per-node props in a format that can be passed to the Tree constructor.

balance(config⁠?: Object = {}) → Tree

Balance the direct children of this tree, producing a copy of which may have children grouped into subtrees with type NodeType.none.

config
makeTree⁠?: fn(
childrenreadonly (Tree | TreeBuffer)[],
positionsreadonly number[],
) → Tree

Function to create the newly balanced subtrees.

static empty: Tree

The empty tree

static build(dataObject) → Tree

Build a tree from a postfix-ordered buffer of node information, or a cursor over such a buffer.

data
buffer: BufferCursor | readonly number[]

The buffer or buffer cursor to read the node data from.

When this is an array, it should contain four values for every node in the tree.

  • The first holds the node's type, as a node ID pointing into the given NodeSet.
  • The second holds the node's start offset.
  • The third the end offset.
  • The fourth the amount of space taken up in the array by this node and its children. Since there's four values per node, this is the total number of nodes inside this node (children and transitive children) plus one for the node itself, times four.

Parent nodes should appear after child nodes in the array. As an example, a node of type 10 spanning positions 0 to 4, with two children, of type 11 and 12, might look like this:

[11, 0, 1, 4, 12, 2, 4, 4, 10, 0, 4, 12]
nodeSet: NodeSet

The node types to use.

topID: number

The id of the top node type.

start⁠?: number

The position the tree should start at. Defaults to 0.

bufferStart⁠?: number

The position in the buffer where the function should stop reading. Defaults to 0.

length⁠?: number

The length of the wrapping node. The end offset of the last child is used when not provided.

maxBufferLength⁠?: number

The maximum buffer length to use. Defaults to DefaultBufferLength.

reused⁠?: readonly Tree[]

An optional array holding reused nodes that the buffer can refer to.

minRepeatType⁠?: number

The first node type that indicates repeat constructs in this grammar.

interface SyntaxNodeRef

The set of properties provided by both SyntaxNode and TreeCursor. Note that, if you need an object that is guaranteed to stay stable in the future, you need to use the node accessor.

from: number

The start position of the node.

to: number

The end position of the node.

type: NodeType

The type of the node.

name: string

The name of the node (.type.name).

tree: Tree | null

Get the tree that represents the current node, if any. Will return null when the node is in a tree buffer.

node: SyntaxNode

Retrieve a stable syntax node at this position.

matchContext(contextreadonly string[]) → boolean

Test whether the node matches a given context—a sequence of direct parent nodes. Empty strings in the context array act as wildcards, other strings must match the ancestor node's name.

interface SyntaxNode extends SyntaxNodeRef

A syntax node provides an immutable pointer to a given node in a tree. When iterating over large amounts of nodes, you may want to use a mutable cursor instead, which is more efficient.

parent: SyntaxNode | null

The node's parent node, if any.

firstChild: SyntaxNode | null

The first child, if the node has children.

lastChild: SyntaxNode | null

The node's last child, if available.

childAfter(posnumber) → SyntaxNode | null

The first child that ends after pos.

childBefore(posnumber) → SyntaxNode | null

The last child that starts before pos.

enter(
side-1 | 0 | 1,
) → SyntaxNode | null

Enter the child at the given position. If side is -1 the child may end at that position, when 1 it may start there.

This will by default enter overlaid mounted trees. You can set overlays to false to disable that.

Similarly, when buffers is false this will not enter buffers, only nodes (which is mostly useful when looking for props, which cannot exist on buffer-allocated nodes).

nextSibling: SyntaxNode | null

This node's next sibling, if any.

prevSibling: SyntaxNode | null

This node's previous sibling.

cursor(mode⁠?: IterMode) → TreeCursor

A tree cursor starting at this node.

resolve(posnumber, side⁠?: -1 | 0 | 1) → SyntaxNode

Find the node around, before (if side is -1), or after (side is 1) the given position. Will look in parent nodes if the position is outside this node.

resolveInner(posnumber, side⁠?: -1 | 0 | 1) → SyntaxNode

Similar to resolve, but enter overlaid nodes.

enterUnfinishedNodesBefore(posnumber) → SyntaxNode

Move the position to the innermost node before pos that looks like it is unfinished (meaning it ends in an error node or has a child ending in an error node right at its end).

toTree() → Tree

Get a tree for this node. Will allocate one if it points into a buffer.

getChild(
before⁠?: string | number | null,
) → SyntaxNode | null

Get the first child of the given type (which may be a node name or a group name). If before is non-null, only return children that occur somewhere after a node with that name or group. If after is non-null, only return children that occur somewhere before a node with that name or group.

getChildren(
before⁠?: string | number | null,
) → SyntaxNode[]

Like getChild, but return all matching children, not just the first.

type NodeIterator

Represents a sequence of nodes.

node: SyntaxNode
next: NodeIterator | null

class TreeCursor implements SyntaxNodeRef

A tree cursor object focuses on a given node in a syntax tree, and allows you to move to adjacent nodes.

type: NodeType

The node's type.

name: string

Shorthand for .type.name.

from: number

The start source offset of this node.

to: number

The end source offset.

firstChild() → boolean

Move the cursor to this node's first child. When this returns false, the node has no child, and the cursor has not been moved.

lastChild() → boolean

Move the cursor to this node's last child.

childAfter(posnumber) → boolean

Move the cursor to the first child that ends after pos.

childBefore(posnumber) → boolean

Move to the last child that starts before pos.

enter(
side-1 | 0 | 1,
mode⁠?: IterMode = this.mode
) → boolean

Move the cursor to the child around pos. If side is -1 the child may end at that position, when 1 it may start there. This will also enter overlaid mounted trees unless overlays is set to false.

parent() → boolean

Move to the node's parent node, if this isn't the top node.

nextSibling() → boolean

Move to this node's next sibling, if any.

prevSibling() → boolean

Move to this node's previous sibling, if any.

next(enter⁠?: boolean = true) → boolean

Move to the next node in a pre-order traversal, going from a node to its first child or, if the current node is empty or enter is false, its next sibling or the next sibling of the first parent node that has one.

prev(enter⁠?: boolean = true) → boolean

Move to the next node in a last-to-first pre-order traveral. A node is followed by its last child or, if it has none, its previous sibling or the previous sibling of the first parent node that has one.

moveTo(posnumber, side⁠?: -1 | 0 | 1 = 0) → TreeCursor

Move the cursor to the innermost node that covers pos. If side is -1, it will enter nodes that end at pos. If it is 1, it will enter nodes that start at pos.

node: SyntaxNode

Get a syntax node at the cursor's current position.

tree: Tree | null

Get the tree that represents the current node, if any. Will return null when the node is in a tree buffer.

iterate()

Iterate over the current node and all its descendants, calling enter when entering a node and leave, if given, when leaving one. When enter returns false, any children of that node are skipped, and leave isn't called for it.

matchContext(contextreadonly string[]) → boolean

Test whether the current node matches a given context—a sequence of direct parent node names. Empty strings in the context array are treated as wildcards.

enum IterMode

Options that control iteration. Can be combined with the | operator to enable multiple ones.

ExcludeBuffers

When enabled, iteration will only visit Tree objects, not nodes packed into TreeBuffers.

IncludeAnonymous

Enable this to make iteration include anonymous nodes (such as the nodes that wrap repeated grammar constructs into a balanced tree).

IgnoreMounts

By default, regular mounted nodes replace their base node in iteration. Enable this to ignore them instead.

IgnoreOverlays

This option only applies in enter-style methods. It tells the library to not enter mounted overlays if one covers the given position.

class NodeWeakMap<T>

Provides a way to associate values with pieces of trees. As long as that part of the tree is reused, the associated values can be retrieved from an updated tree.

set(nodeSyntaxNode, valueT)

Set the value for this syntax node.

get(nodeSyntaxNode) → T | undefined

Retrieve value for this syntax node, if it exists in the map.

cursorSet(cursorTreeCursor, valueT)

Set the value for the node that a cursor currently points to.

cursorGet(cursorTreeCursor) → T | undefined

Retrieve the value for the node that a cursor currently points to.

Node types

class NodeType

Each node in a syntax tree has a node type associated with it.

name: string

The name of the node type. Not necessarily unique, but if the grammar was written properly, different node types with the same name within a node set should play the same semantic role.

id: number

The id of this node in its set. Corresponds to the term ids used in the parser.

prop<T>(propNodeProp<T>) → T | undefined

Retrieves a node prop for this type. Will return undefined if the prop isn't present on this node.

isTop: boolean

True when this is the top node of a grammar.

isSkipped: boolean

True when this node is produced by a skip rule.

isError: boolean

Indicates whether this is an error node.

isAnonymous: boolean

When true, this node type doesn't correspond to a user-declared named node, for example because it is used to cache repetition.

is(namestring | number) → boolean

Returns true when this node's name or one of its groups matches the given string.

static define(specObject) → NodeType

Define a node type.

spec
id: number

The ID of the node type. When this type is used in a set, the ID must correspond to its index in the type array.

name⁠?: string

The name of the node type. Leave empty to define an anonymous node.

props⁠?: readonly (NodePropSource | [NodeProp<any>, any])[]

Node props to assign to the type. The value given for any given prop should correspond to the prop's type.

top⁠?: boolean

Whether this is a top node.

error⁠?: boolean

Whether this node counts as an error node.

skipped⁠?: boolean

Whether this node is a skipped node.

static none: NodeType

An empty dummy node type to use when no actual type is available.

static match<T>(mapObject<T>) → fn(nodeNodeType) → T | undefined

Create a function from node types to arbitrary values by specifying an object whose property names are node or group names. Often useful with NodeProp.add. You can put multiple names, separated by spaces, in a single property name to map multiple node names to a single value.

class NodeSet

A node set holds a collection of node types. It is used to compactly represent trees by storing their type ids, rather than a full pointer to the type object, in a numeric array. Each parser has a node set, and tree buffers can only store collections of nodes from the same set. A set can have a maximum of 2**16 (65536) node types in it, so that the ids fit into 16-bit typed array slots.

new NodeSet(typesreadonly NodeType[])

Create a set with the given types. The id property of each type should correspond to its position within the array.

types: readonly NodeType[]

The node types in this set, by id.

extend(...propsNodePropSource[]) → NodeSet

Create a copy of this set with some node properties added. The arguments to this method can be created with NodeProp.add.

class NodeProp<T>

Each node type or individual tree can have metadata associated with it in props. Instances of this class represent prop names.

new NodeProp(config⁠?: Object = {})

Create a new node prop type.

config
deserialize⁠?: fn(strstring) → T

The deserialize function to use for this prop, used for example when directly providing the prop from a grammar file. Defaults to a function that raises an error.

perNode⁠?: boolean

By default, node props are stored in the node type. It can sometimes be useful to directly store information (usually related to the parsing algorithm) in nodes themselves. Set this to true to enable that for this prop.

perNode: boolean

Indicates whether this prop is stored per node type or per tree node.

deserialize(strstring) → T

A method that deserializes a value of this prop from a string. Can be used to allow a prop to be directly written in a grammar file.

add(
matchObject<T> |
fn(typeNodeType) → T | undefined
) → NodePropSource

This is meant to be used with NodeSet.extend or LRParser.configure to compute prop values for each node type in the set. Takes a match object or function that returns undefined if the node type doesn't get this prop, and the prop's value if it does.

static closedBy: NodeProp<readonly string[]>

Prop that is used to describe matching delimiters. For opening delimiters, this holds an array of node names (written as a space-separated string when declaring this prop in a grammar) for the node types of closing delimiters that match it.

static openedBy: NodeProp<readonly string[]>

The inverse of closedBy. This is attached to closing delimiters, holding an array of node names of types of matching opening delimiters.

static group: NodeProp<readonly string[]>

Used to assign node types to groups (for example, all node types that represent an expression could be tagged with an "Expression" group).

static isolate: NodeProp<"rtl" | "ltr" | "auto">

Attached to nodes to indicate these should be displayed in a bidirectional text isolate, so that direction-neutral characters on their sides don't incorrectly get associated with surrounding text. You'll generally want to set this for nodes that contain arbitrary text, like strings and comments, and for nodes that appear inside arbitrary text, like HTML tags. When not given a value, in a grammar declaration, defaults to "auto".

static contextHash: NodeProp<number>

The hash of the context that the node was parsed in, if any. Used to limit reuse of contextual nodes.

static lookAhead: NodeProp<number>

The distance beyond the end of the node that the tokenizer looked ahead for any of the tokens inside the node. (The LR parser only stores this when it is larger than 25, for efficiency reasons.)

static mounted: NodeProp<MountedTree>

This per-node prop is used to replace a given node, or part of a node, with another tree. This is useful to include trees from different languages in mixed-language parsers.

type NodePropSource = fn(typeNodeType) → [NodeProp<any>, any] | null

Type returned by NodeProp.add. Describes whether a prop should be added to a given node type in a node set, and what value it should have.

Buffers

Buffers are an optimization in the way Lezer trees are stored.

class TreeBuffer

Tree buffers contain (type, start, end, endIndex) quads for each node. In such a buffer, nodes are stored in prefix order (parents before children, with the endIndex of the parent indicating which children belong to it).

new TreeBuffer()

Create a tree buffer.

buffer: Uint16Array

The buffer's content.

length: number

The total length of the group of nodes in the buffer.

set: NodeSet

The node set used in this buffer.

DefaultBufferLength: 1024

The default maximum length of a TreeBuffer node.

interface BufferCursor

This is used by Tree.build as an abstraction for iterating over a tree buffer. A cursor initially points at the very last element in the buffer. Every time next() is called it moves on to the previous one.

pos: number

The current buffer position (four times the number of nodes remaining).

id: number

The node ID of the next node in the buffer.

start: number

The start position of the next node in the buffer.

end: number

The end position of the next node.

size: number

The size of the next node (the number of nodes inside, counting the node itself, times 4).

next()

Moves this.pos down by 4.

fork() → BufferCursor

Create a copy of this cursor.

Parsing

abstract class Parser

A superclass that parsers should extend.

abstract createParse(
fragmentsreadonly TreeFragment[],
rangesreadonly {fromnumber, tonumber}[]
) → PartialParse

Start a parse for a single tree. This is the method concrete parser implementations must implement. Called by startParse, with the optional arguments resolved.

startParse(
fragments⁠?: readonly TreeFragment[],
ranges⁠?: readonly {fromnumber, tonumber}[]
) → PartialParse

Start a parse, returning a partial parse object. fragments can be passed in to make the parse incremental.

By default, the entire input is parsed. You can pass ranges, which should be a sorted array of non-empty, non-overlapping ranges, to parse only those ranges. The tree returned in that case will start at ranges[0].from.

parse(
fragments⁠?: readonly TreeFragment[],
ranges⁠?: readonly {fromnumber, tonumber}[]
) → Tree

Run a full parse, returning the resulting tree.

interface Input

This is the interface parsers use to access the document. To run Lezer directly on your own document data structure, you have to write an implementation of it.

length: number

The length of the document.

chunk(fromnumber) → string

Get the chunk after the given position. The returned string should start at from and, if that isn't the end of the document, may be of any length greater than zero.

lineChunks: boolean

Indicates whether the chunks already end at line breaks, so that client code that wants to work by-line can avoid re-scanning them for line breaks. When this is true, the result of chunk() should either be a single line break, or the content between from and the next line break.

read(fromnumber, tonumber) → string

Read the part of the document between the given positions.

interface PartialParse

Interface used to represent an in-progress parse, which can be moved forward piece-by-piece.

advance() → Tree | null

Advance the parse state by some amount. Will return the finished syntax tree when the parse completes.

parsedPos: number

The position up to which the document has been parsed. Note that, in multi-pass parsers, this will stay back until the last pass has moved past a given position.

stopAt(posnumber)

Tell the parse to not advance beyond the given position. advance will return a tree when the parse has reached the position. Note that, depending on the parser algorithm and the state of the parse when stopAt was called, that tree may contain nodes beyond the position. It is an error to call stopAt with a higher position than it's current value.

stoppedAt: number | null

Reports whether stopAt has been called on this parse.

type ParseWrapper = fn(
fragmentsreadonly TreeFragment[],
rangesreadonly {fromnumber, tonumber}[]
) → PartialParse

Parse wrapper functions are supported by some parsers to inject additional parsing logic.

Incremental Parsing

Efficient reparsing happens by reusing parts of the original parsed structure.

class TreeFragment

Tree fragments are used during incremental parsing to track parts of old trees that can be reused in a new parse. An array of fragments is used to track regions of an old tree whose nodes might be reused in new parses. Use the static applyChanges method to update fragments for document changes.

new TreeFragment(
openStart⁠?: boolean = false,
openEnd⁠?: boolean = false
)

Construct a tree fragment. You'll usually want to use addTree and applyChanges instead of calling this directly.

from: number

The start of the unchanged range pointed to by this fragment. This refers to an offset in the updated document (as opposed to the original tree).

to: number

The end of the unchanged range.

tree: Tree

The tree that this fragment is based on.

offset: number

The offset between the fragment's tree and the document that this fragment can be used against. Add this when going from document to tree positions, subtract it to go from tree to document positions.

openStart: boolean

Whether the start of the fragment represents the start of a parse, or the end of a change. (In the second case, it may not be safe to reuse some nodes at the start, depending on the parsing algorithm.)

openEnd: boolean

Whether the end of the fragment represents the end of a full-document parse, or the start of a change.

static addTree(
fragments⁠?: readonly TreeFragment[] = [],
partial⁠?: boolean = false
) → readonly TreeFragment[]

Create a set of fragments from a freshly parsed tree, or update an existing set of fragments by replacing the ones that overlap with a tree with content from the new tree. When partial is true, the parse is treated as incomplete, and the resulting fragment has openEnd set to true.

static applyChanges(
fragmentsreadonly TreeFragment[],
changesreadonly ChangedRange[],
minGap⁠?: number = 128
) → readonly TreeFragment[]

Apply a set of edits to an array of fragments, removing or splitting fragments as necessary to remove edited ranges, and adjusting offsets for fragments that moved.

interface ChangedRange

The TreeFragment.applyChanges method expects changed ranges in this format.

fromA: number

The start of the change in the start document

toA: number

The end of the change in the start document

fromB: number

The start of the replacement in the new document

toB: number

The end of the replacement in the new document

Mixed Parsing

parseMixed() → ParseWrapper

Create a parse wrapper that, after the inner parse completes, scans its tree for mixed language regions with the nest function, runs the resulting inner parses, and then mounts their results onto the tree.

interface NestedParse

Objects returned by the function passed to parseMixed should conform to this interface.

parser: Parser

The parser to use for the inner region.

overlay⁠?: readonly {fromnumber, tonumber}[] |
fn(nodeSyntaxNodeRef) → boolean | {fromnumber, tonumber}

When this property is not given, the entire node is parsed with this parser, and it is mounted as a non-overlay node, replacing its host node in tree iteration.

When an array of ranges is given, only those ranges are parsed, and the tree is mounted as an overlay.

When a function is given, that function will be called for descendant nodes of the target node, not including child nodes that are covered by another nested parse, to determine the overlay ranges. When it returns true, the entire descendant is included, otherwise just the range given. The mixed parser will optimize range-finding in reused nodes, which means it's a good idea to use a function here when the target node is expected to have a large, deep structure.

class MountedTree

A mounted tree, which can be stored on a tree node to indicate that parts of its content are represented by another tree.

new MountedTree(
overlayreadonly {fromnumber, tonumber}[] | ,
)
tree: Tree

The inner tree.

overlay: readonly {fromnumber, tonumber}[] |

If this is null, this tree replaces the entire node (it will be included in the regular iteration instead of its host node). If not, only the given ranges are considered to be covered by this tree. This is used for trees that are mixed in a way that isn't strictly hierarchical. Such mounted trees are only entered by resolveInner and enter.

parser: Parser

The parser used to create this subtree.

@lezer/lr module

class LRParser extends Parser

Holds the parse tables for a given grammar, as generated by lezer-generator, and provides methods to parse content with.

nodeSet: NodeSet

The nodes used in the trees emitted by this parser.

configure(configParserConfig) → LRParser

Configure the parser. Returns a new parser instance that has the given settings modified. Settings not provided in config are kept from the original parser.

hasWrappers() → boolean

Tells you whether any parse wrappers are registered for this parser.

getName(termnumber) → string

Returns the name associated with a given term. This will only work for all terms when the parser was generated with the --names option. By default, only the names of tagged terms are stored.

topNode: NodeType

The type of top node produced by the parser.

interface ParserConfig

Configuration options when reconfiguring a parser.

props⁠?: readonly NodePropSource[]

Node prop values to add to the parser's node set.

top⁠?: string

The name of the @top declaration to parse from. If not specified, the first top rule declaration in the grammar is used.

dialect⁠?: string

A space-separated string of dialects to enable.

tokenizers⁠?: {fromExternalTokenizer, toExternalTokenizer}[]

Replace the given external tokenizers with new ones.

specializers⁠?: {
fromfn(valuestring, stackStack) → number,
tofn(valuestring, stackStack) → number
}[]

Replace external specializers with new ones.

contextTracker⁠?: ContextTracker<any>

Replace the context tracker with a new one.

strict⁠?: boolean

When true, the parser will raise an exception, rather than run its error-recovery strategies, when the input doesn't match the grammar.

wrap⁠?: ParseWrapper

Add a wrapper, which can extend parses created by this parser with additional logic (usually used to add mixed-language parsing).

bufferLength⁠?: number

The maximum length of the TreeBuffers generated in the output tree. Defaults to 1024.

class ContextTracker<T>

Context trackers are used to track stateful context (such as indentation in the Python grammar, or parent elements in the XML grammar) needed by external tokenizers. You declare them in a grammar file as @context exportName from "module".

Context values should be immutable, and can be updated (replaced) on shift or reduce actions.

The export used in a @context declaration should be of this type.

new ContextTracker(specObject)

Define a context tracker.

spec
start: T

The initial value of the context at the start of the parse.

shift⁠?: fn() → T

Update the context when the parser executes a shift action.

reduce⁠?: fn() → T

Update the context when the parser executes a reduce action.

reuse⁠?: fn() → T

Update the context when the parser reuses a node from a tree fragment.

hash⁠?: fn(contextT) → number

Reduce a context value to a number (for cheap storage and comparison). Only needed for strict contexts.

strict⁠?: boolean

By default, nodes can only be reused during incremental parsing if they were created in the same context as the one in which they are reused. Set this to false to disable that check (and the overhead of storing the hashes).

class InputStream

Tokenizers interact with the input through this interface. It presents the input as a stream of characters, tracking lookahead and hiding the complexity of ranges from tokenizer code.

next: number

The character code of the next code unit in the input, or -1 when the stream is at the end of the input.

pos: number

The current position of the stream. Note that, due to parses being able to cover non-contiguous ranges, advancing the stream does not always mean its position moves a single unit.

peek(offsetnumber) → number

Look at a code unit near the stream position. .peek(0) equals .next, .peek(-1) gives you the previous character, and so on.

Note that looking around during tokenizing creates dependencies on potentially far-away content, which may reduce the effectiveness incremental parsing—when looking forward—or even cause invalid reparses when looking backward more than 25 code units, since the library does not track lookbehind.

acceptToken(tokennumber, endOffset⁠?: number = 0)

Accept a token. By default, the end of the token is set to the current stream position, but you can pass an offset (relative to the stream position) to change that.

acceptTokenTo(tokennumber, endPosnumber)

Accept a token ending at a specific given position.

advance(n⁠?: number = 1) → number

Move the stream forward N (defaults to 1) code units. Returns the new value of next.

class ExternalTokenizer

@external tokens declarations in the grammar should resolve to an instance of this class.

new ExternalTokenizer(
options⁠?: Object = {}
)

Create a tokenizer. The first argument is the function that, given an input stream, scans for the types of tokens it recognizes at the stream's position, and calls acceptToken when it finds one.

options
contextual⁠?: boolean

When set to true, mark this tokenizer as depending on the current parse stack, which prevents its result from being cached between parser actions at the same positions.

fallback⁠?: boolean

By defaults, when a tokenizer returns a token, that prevents tokenizers with lower precedence from even running. When fallback is true, the tokenizer is allowed to run when a previous tokenizer returned a token that didn't match any of the current state's actions.

extend⁠?: boolean

When set to true, tokenizing will not stop after this tokenizer has produced a token. (But it will still fail to reach this one if a higher-precedence tokenizer produced a token.)

class Stack

A parse stack. These are used internally by the parser to track parsing progress. They also provide some properties and methods that external code such as a tokenizer can use to get information about the parse state.

pos: number

The input position up to which this stack has parsed.

context: any

The stack's current context value, if any. Its type will depend on the context tracker's type parameter, or it will be null if there is no context tracker.

canShift(termnumber) → boolean

Check if the given term would be able to be shifted (optionally after some reductions) on this stack. This can be useful for external tokenizers that want to make sure they only provide a given token when it applies.

parser: LRParser

Get the parser used by this stack.

dialectEnabled(dialectIDnumber) → boolean

Test whether a given dialect (by numeric ID, as exported from the terms file) is enabled.

@lezer/highlight module

This package provides a vocabulary for syntax-highlighting code based on a Lezer syntax tree.

class Tag

Highlighting tags are markers that denote a highlighting category. They are associated with parts of a syntax tree by a language mode, and then mapped to an actual CSS style by a highlighter.

Because syntax tree node types and highlight styles have to be able to talk the same language, CodeMirror uses a mostly closed vocabulary of syntax tags (as opposed to traditional open string-based systems, which make it hard for highlighting themes to cover all the tokens produced by the various languages).

It is possible to define your own highlighting tags for system-internal use (where you control both the language package and the highlighter), but such tags will not be picked up by regular highlighters (though you can derive them from standard tags to allow highlighters to fall back to those).

set: Tag[]

The set of this tag and all its parent tags, starting with this one itself and sorted in order of decreasing specificity.

static define(parent⁠?: Tag) → Tag

Define a new tag. If parent is given, the tag is treated as a sub-tag of that parent, and highlighters that don't mention this tag will try to fall back to the parent tag (or grandparent tag, etc).

static defineModifier() → fn(tagTag) → Tag

Define a tag modifier, which is a function that, given a tag, will return a tag that is a subtag of the original. Applying the same modifier to a twice tag will return the same value (m1(t1) == m1(t1)) and applying multiple modifiers will, regardless or order, produce the same tag (m1(m2(t1)) == m2(m1(t1))).

When multiple modifiers are applied to a given base tag, each smaller set of modifiers is registered as a parent, so that for example m1(m2(m3(t1))) is a subtype of m1(m2(t1)), m1(m3(t1), and so on.

tags: Object

The default set of highlighting tags.

This collection is heavily biased towards programming languages, and necessarily incomplete. A full ontology of syntactic constructs would fill a stack of books, and be impractical to write themes for. So try to make do with this set. If all else fails, open an issue to propose a new tag, or define a local custom tag for your use case.

Note that it is not obligatory to always attach the most specific tag possible to an element—if your grammar can't easily distinguish a certain type of element (such as a local variable), it is okay to style it as its more general variant (a variable).

For tags that extend some parent tag, the documentation links to the parent.

comment: Tag

A comment.

lineComment: Tag

A line comment.

blockComment: Tag

A block comment.

docComment: Tag

A documentation comment.

name: Tag

Any kind of identifier.

variableName: Tag

The name of a variable.

typeName: Tag

A type name.

tagName: Tag

A tag name (subtag of typeName).

propertyName: Tag

A property or field name.

attributeName: Tag

An attribute name (subtag of propertyName).

className: Tag

The name of a class.

labelName: Tag

A label name.

namespace: Tag

A namespace name.

macroName: Tag

The name of a macro.

literal: Tag

A literal value.

string: Tag

A string literal.

docString: Tag

A documentation string.

character: Tag

A character literal (subtag of string).

attributeValue: Tag

An attribute value (subtag of string).

number: Tag

A number literal.

integer: Tag

An integer number literal.

float: Tag

A floating-point number literal.

bool: Tag

A boolean literal.

regexp: Tag

Regular expression literal.

escape: Tag

An escape literal, for example a backslash escape in a string.

color: Tag

A color literal.

url: Tag

A URL literal.

keyword: Tag

A language keyword.

self: Tag

The keyword for the self or this object.

null: Tag

The keyword for null.

atom: Tag

A keyword denoting some atomic value.

unit: Tag

A keyword that represents a unit.

modifier: Tag

A modifier keyword.

operatorKeyword: Tag

A keyword that acts as an operator.

controlKeyword: Tag

A control-flow related keyword.

definitionKeyword: Tag

A keyword that defines something.

moduleKeyword: Tag

A keyword related to defining or interfacing with modules.

operator: Tag

An operator.

derefOperator: Tag

An operator that dereferences something.

arithmeticOperator: Tag

Arithmetic-related operator.

logicOperator: Tag

Logical operator.

bitwiseOperator: Tag

Bit operator.

compareOperator: Tag

Comparison operator.

updateOperator: Tag

Operator that updates its operand.

definitionOperator: Tag

Operator that defines something.

typeOperator: Tag

Type-related operator.

controlOperator: Tag

Control-flow operator.

punctuation: Tag

Program or markup punctuation.

separator: Tag

Punctuation that separates things.

bracket: Tag

Bracket-style punctuation.

angleBracket: Tag

Angle brackets (usually < and > tokens).

squareBracket: Tag

Square brackets (usually [ and ] tokens).

paren: Tag

Parentheses (usually ( and ) tokens). Subtag of bracket.

brace: Tag

Braces (usually { and } tokens). Subtag of bracket.

content: Tag

Content, for example plain text in XML or markup documents.

heading: Tag

Content that represents a heading.

heading1: Tag

A level 1 heading.

heading2: Tag

A level 2 heading.

heading3: Tag

A level 3 heading.

heading4: Tag

A level 4 heading.

heading5: Tag

A level 5 heading.

heading6: Tag

A level 6 heading.

contentSeparator: Tag

A prose separator (such as a horizontal rule).

list: Tag

Content that represents a list.

quote: Tag

Content that represents a quote.

emphasis: Tag

Content that is emphasized.

strong: Tag

Content that is styled strong.

Content that is part of a link.

monospace: Tag

Content that is styled as code or monospace.

strikethrough: Tag

Content that has a strike-through style.

inserted: Tag

Inserted text in a change-tracking format.

deleted: Tag

Deleted text.

changed: Tag

Changed text.

invalid: Tag

An invalid or unsyntactic element.

meta: Tag

Metadata or meta-instruction.

documentMeta: Tag

Metadata that applies to the entire document.

annotation: Tag

Metadata that annotates or adds attributes to a given syntactic element.

processingInstruction: Tag

Processing instruction or preprocessor directive. Subtag of meta.

definition(tagTag) → Tag

Modifier that indicates that a given element is being defined. Expected to be used with the various name tags.

constant(tagTag) → Tag

Modifier that indicates that something is constant. Mostly expected to be used with variable names.

function(tagTag) → Tag

Modifier used to indicate that a variable or property name is being called or defined as a function.

standard(tagTag) → Tag

Modifier that can be applied to names to indicate that they belong to the language's standard environment.

local(tagTag) → Tag

Modifier that indicates a given names is local to some scope.

special(tagTag) → Tag

A generic variant modifier that can be used to tag language-specific alternative variants of some common tag. It is recommended for themes to define special forms of at least the string and variable name tags, since those come up a lot.

styleTags(specObject<Tag | readonly Tag[]>) → NodePropSource

This function is used to add a set of tags to a language syntax via NodeSet.extend or LRParser.configure.

The argument object maps node selectors to highlighting tags or arrays of tags.

Node selectors may hold one or more (space-separated) node paths. Such a path can be a node name, or multiple node names (or * wildcards) separated by slash characters, as in "Block/Declaration/VariableName". Such a path matches the final node but only if its direct parent nodes are the other nodes mentioned. A * in such a path matches any parent, but only a single level—wildcards that match multiple parents aren't supported, both for efficiency reasons and because Lezer trees make it rather hard to reason about what they would match.)

A path can be ended with /... to indicate that the tag assigned to the node should also apply to all child nodes, even if they match their own style (by default, only the innermost style is used).

When a path ends in !, as in Attribute!, no further matching happens for the node's child nodes, and the entire node gets the given style.

In this notation, node names that contain /, !, *, or ... must be quoted as JSON strings.

For example:

parser.withProps(
  styleTags({
    // Style Number and BigNumber nodes
    "Number BigNumber": tags.number,
    // Style Escape nodes whose parent is String
    "String/Escape": tags.escape,
    // Style anything inside Attributes nodes
    "Attributes!": tags.meta,
    // Add a style to all content inside Italic nodes
    "Italic/...": tags.emphasis,
    // Style InvalidString nodes as both `string` and `invalid`
    "InvalidString": [tags.string, tags.invalid],
    // Style the node named "/" as punctuation
    '"/"': tags.punctuation
  })
)
getStyleTags(nodeSyntaxNodeRef) → {tagsreadonly Tag[], opaqueboolean, inheritboolean} |

Match a syntax node's highlight rules. If there's a match, return its set of tags, and whether it is opaque (uses a !) or applies to all child nodes (/...).

interface Highlighter

A highlighter defines a mapping from highlighting tags and language scopes to CSS class names. They are usually defined via tagHighlighter or some wrapper around that, but it is also possible to implement them from scratch.

style(tagsreadonly Tag[]) → string | null

Get the set of classes that should be applied to the given set of highlighting tags, or null if this highlighter doesn't assign a style to the tags.

scope⁠?: fn(nodeNodeType) → boolean

When given, the highlighter will only be applied to trees on whose top node this predicate returns true.

tagHighlighter(
tagsreadonly {tagTag | readonly Tag[], classstring}[],
) → Highlighter

Define a highlighter from an array of tag/class pairs. Classes associated with more specific tags will take precedence.

options
scope⁠?: fn(nodeNodeType) → boolean

By default, highlighters apply to the entire document. You can scope them to a single language by providing the tree's top node type here.

all⁠?: string

Add a style to all tokens. Probably only useful in combination with scope.

highlightCode(
putBreakfn(),
from⁠?: number = 0,
to⁠?: number = code.length
)

Highlight the given tree with the given highlighter, calling putText for every piece of text, either with a set of classes or with the empty string when unstyled, and putBreak for every line break.

highlightTree(
putStylefn(fromnumber, tonumber, classesstring),
from⁠?: number = 0,
to⁠?: number = tree.length
)

Highlight the given tree with the given highlighter. Often, the higher-level highlightCode function is easier to use.

putStyle(fromnumber, tonumber, classesstring)

Assign styling to a region of the text. Will be called, in order of position, for any ranges where more than zero classes apply. classes is a space separated string of CSS classes.

from

The start of the range to highlight.

to

The end of the range.

classHighlighter: Highlighter

This is a highlighter that adds stable, predictable classes to tokens, for styling with external CSS.

The following tags are mapped to their name prefixed with "tok-" (for example "tok-comment"):

In addition, these mappings are provided:

@lezer/generator module

The parser generator is usually ran through its command-line interface, but can also be invoked as a JavaScript function.

type BuildOptions

fileName⁠?: string

The name of the grammar file

warn⁠?: fn(messagestring)

A function that should be called with warnings. The default is to call console.warn.

includeNames⁠?: boolean

Whether to include term names in the output file. Defaults to false.

moduleStyle⁠?: string

Determines the module system used by the output file. Can be either "cjs" (CommonJS) or "es" (ES2015 module), defaults to "es".

typeScript⁠?: boolean

Set this to true to output TypeScript code instead of plain JavaScript.

exportName⁠?: string

The name of the export that holds the parser in the output file. Defaults to "parser".

externalTokenizer⁠?: fn(namestring, termsObject<number>) → ExternalTokenizer

When calling buildParser, this can be used to provide placeholders for external tokenizers.

externalPropSource⁠?: fn(namestring) → NodePropSource

Used by buildParser to resolve external prop sources.

externalSpecializer⁠?: fn(namestring, termsObject<number>) → fn(valuestring, stackStack) → number

Provide placeholders for external specializers when using buildParser.

externalProp⁠?: fn(namestring) → NodeProp<any>

If given, will be used to initialize external props in the parser returned by buildParser.

contextTracker⁠?: ContextTracker<any> |

If given, will be used as context tracker in a parser built with buildParser.

buildParserFile(textstring, options⁠?: BuildOptions = {}) → {parserstring, termsstring}

Build the code that represents the parser tables for a given grammar description. The parser property in the return value holds the main file that exports the Parser instance. The terms property holds a declaration file that defines constants for all of the named terms in grammar, holding their ids as value. This is useful when external code, such as a tokenizer, needs to be able to use these ids. It is recommended to run a tree-shaking bundler when importing this file, since you usually only need a handful of the many terms in your code.

buildParser(textstring, options⁠?: BuildOptions = {}) → LRParser

Build an in-memory parser instance for a given grammar. This is mostly useful for testing. If your grammar uses external tokenizers, you'll have to provide the externalTokenizer option for the returned parser to be able to parse anything.

class GenError extends Error

The type of error raised when the parser generator finds an issue.