Produce UTS from `#lang resyntax/test`
The language's read-syntax implementation should produce Universal Tagged Syntax (UTS), which is my term for the kind of syntax objects I'm using to make Resyntax work on non-s-expression #langs. This implies a few restrictions:
- The result of
syntax-eanyways produces either an atom (a value with no syntax objects inside it) or a compound, which is a proper list of proper syntax objects. - Every list syntax object's first child is a shape tag, which is a keyword used to label the type of the surface syntax node.
- Every atom has a
'uts-contentsyntax property whose value is a string. This is used as the textual form of the atom when writing the syntax object as text. - Every shape tag has a
'uts-separatorssyntax property whose value is a list of strings. The list should be equal to one plus the number of non-tag children in the tagged syntax object. This is combined with the textual forms of the child syntax objects when writing the tagged syntax object as text. The first separator is the prefix before the first child, the last separator is the suffix after the last child, and the middle separators are inserted between the children. The shape tags themselves are not written.
The one exception to the above is that the outermost syntax object returned by a #lang's reader is allowed to start with (module ...), with no shape tag. This is required for compatibility with Racket's #lang mechanism.
In addition to the above, a #lang's reader should produce original reader UTS, which is UTS where every syntax object is syntax-original? and where all syntax object source locations adhere to these rules:
- Source locations never partially overlap.
- One source location encloses another only when the first location's syntax object contains the second location's syntax object as a descendant.
- One source location starting before another implies its syntax object comes before the other one's during a preorder traversal of the entire source file's syntax object.
- Every syntax object's
syntax-sourcevalue is the same and is in some way related to the original file or string containing the#langdirective. - Shape tags start at the same position as their parent syntax object and have a span of zero
- Atoms have a
uts-contentstring equal to the text at their source location in the original source code. Similarly, compounds have auts-separatorslist whose strings are the substrings between its children, between the compound's start position and its first child's start, and between the compound's end position and its last child's end. - The
syntax-spanof a compound is equal to the spans of its children plus the total number of characters in the compound'suts-separatorslist.
Open questions include specifying how the #lang text should be reflected in the UTS properties, how autoformatting should work, how to detect comments, how indentation works, and probably lots of other stuff.