resyntax icon indicating copy to clipboard operation
resyntax copied to clipboard

Produce UTS from `#lang resyntax/test`

Open jackfirth opened this issue 6 months ago • 0 comments

The language's read-syntax implementation should produce Universal Tagged Syntax (UTS), which is my term for the kind of syntax objects I'm using to make Resyntax work on non-s-expression #langs. This implies a few restrictions:

  • The result of syntax-e anyways produces either an atom (a value with no syntax objects inside it) or a compound, which is a proper list of proper syntax objects.
  • Every list syntax object's first child is a shape tag, which is a keyword used to label the type of the surface syntax node.
  • Every atom has a 'uts-content syntax property whose value is a string. This is used as the textual form of the atom when writing the syntax object as text.
  • Every shape tag has a 'uts-separators syntax property whose value is a list of strings. The list should be equal to one plus the number of non-tag children in the tagged syntax object. This is combined with the textual forms of the child syntax objects when writing the tagged syntax object as text. The first separator is the prefix before the first child, the last separator is the suffix after the last child, and the middle separators are inserted between the children. The shape tags themselves are not written.

The one exception to the above is that the outermost syntax object returned by a #lang's reader is allowed to start with (module ...), with no shape tag. This is required for compatibility with Racket's #lang mechanism.

In addition to the above, a #lang's reader should produce original reader UTS, which is UTS where every syntax object is syntax-original? and where all syntax object source locations adhere to these rules:

  • Source locations never partially overlap.
  • One source location encloses another only when the first location's syntax object contains the second location's syntax object as a descendant.
  • One source location starting before another implies its syntax object comes before the other one's during a preorder traversal of the entire source file's syntax object.
  • Every syntax object's syntax-source value is the same and is in some way related to the original file or string containing the #lang directive.
  • Shape tags start at the same position as their parent syntax object and have a span of zero
  • Atoms have a uts-content string equal to the text at their source location in the original source code. Similarly, compounds have a uts-separators list whose strings are the substrings between its children, between the compound's start position and its first child's start, and between the compound's end position and its last child's end.
  • The syntax-span of a compound is equal to the spans of its children plus the total number of characters in the compound's uts-separators list.

Open questions include specifying how the #lang text should be reflected in the UTS properties, how autoformatting should work, how to detect comments, how indentation works, and probably lots of other stuff.

jackfirth avatar Aug 21 '25 20:08 jackfirth