candid icon indicating copy to clipboard operation
candid copied to clipboard

Add span-aware AST and lossy parser for language server

Open wiyota opened this issue 1 month ago • 2 comments

Overview

Introduce a span-aware AST that the parser now builds first so experimental candid-language-server can attach precise locations to every declaration, field, and service item.

The new syntax::spanned module defines the source-of-truth structures and parsing helpers (rust/candid_parser/src/syntax/spanned.rs:13), grammar.lalrpop drives lalrpop to emit those structures (rust/candid_parser/src/grammar.lalrpop:4), and the existing public AST in syntax converts from the spanned version to keep the current API stable (rust/candid_parser/src/syntax/mod.rs:49).

Parser entry points and new regression tests (rust/candid_parser/tests/parse_prog.rs:16) exercise the updated pipeline so doc comments and services survive the round-trip.

A lossy parser variant was added so the language server can recover as much structure as possible while still collecting tokens.

Requirements

  • Provide source spans for every AST node so the language server can power go-to-definition, hover, and diagnostics without reparsing.
  • Preserve the existing span-less candid_parser::syntax types for downstream crates while offering a lossless path to spanned data.
  • Ensure token-recording and lossy parsing entry points keep working with the new AST (covered by the new parser tests in rust/candid_parser/tests/parse_prog.rs:16).

Considered Solutions

  • Extending the existing AST structs with optional spans would have been a breaking change for every consumer.
  • Maintaining a separate parser just for the language server risked grammar drift and duplicated maintenance.
  • The spanned-AST-plus-conversion approach keeps a single grammar and minimizes API churn.

Recommended Solution

Adopt the spanned AST internally and convert to the existing structs for callers that rely on them. Bindings/generators were updated where they destructure AST nodes, the lossy parser entry point (parse_prog_lossy) now rides on the spanned types, and new tests cover both exact and lossy parsing scenarios.

This keeps the public surface stable while supplying the language server with the spans it needs.

Considerations

  • This work primarily benefits experimental candid-language-server; other consumers interact with the familiar API but pay a small overhead because span-less structs are now derived from the spanned ones on every parse.
  • When consumers need the old AST, we convert out of the spanned version, so there is a slight allocation/copy cost that we’ll monitor.
  • We now have two AST definitions (syntax/mod.rs vs. syntax/spanned.rs), so future changes must modify both in lockstep to prevent drift.

wiyota avatar Nov 10 '25 03:11 wiyota

Thank you for your review.

I plan to revert the refactoring-related changes and consolidate them into a separate PR.

Also, I will change the spanned module to replace the conventional syntax module.

wiyota avatar Nov 13 '25 13:11 wiyota

I've made changes to keep it as close to the master branch as possible. How does it look?

wiyota avatar Nov 16 '25 00:11 wiyota