Ohm-S icon indicating copy to clipboard operation
Ohm-S copied to clipboard

Allow access to skipped whitespace

Open codeZeilen opened this issue 4 years ago • 13 comments

In order to support Pretty Printers, the CST should allow access to whitespace. The whitespace does not have to be explicitly materialized in the CST but could also be re-constructed after the fact.

Here are two suggestions:

  • CST nodes support the method skippedSpaces that returns the spaces skipped before the node was applied. There are two options how the spaces could be returned:
    • The spaces are returned as a simple string
    • The spaces are returned as a spaces node
  • CST nodes support the method childrenIncludingSpaces that returns a collection of all parsed nodes including the space nodes in between

Any thoughts from your side @Paula-Kli?

codeZeilen avatar Jul 06 '20 09:07 codeZeilen

From our perspective skippedSpaces returning a spaces node would be the better option. Since than you can apply this method in the value:-Method of OhmAttributes. In Addition you can then apply value: to the returned spaces node and using this method one can evaluate it.

What we don't really get is your second suggestion: what do you mean by "a collection of all parsed nodes"? It is not a new CST right?

Paula-Kli avatar Jul 07 '20 09:07 Paula-Kli

In addition to the children method wthere would be the childrenIncludingSpaces method that returns the children as if the spaces were parsed from the start on. For example

aNode children -> {leftHandNode . operatorNode . rightHandNode}

aNode childrenIncludingSpaces -> {leftHandNode . spacesNode . operatorNode . spacesNode . rightHandNode}
"I still have to figure out whether there should be a spaces node at the first position in that array

Instead of getting the skippedSpaces for a specific node, this interface allows for a more "natural" tree traversal. However, it might be very difficult to implement correctly... :)

codeZeilen avatar Jul 07 '20 11:07 codeZeilen

Ah alright I see! In addition to saying it might be more difficult to implement correctly we still think skippedSpaces would be a more useful method. Thanks a lot for asking!

Paula-Kli avatar Jul 07 '20 14:07 Paula-Kli

Please have a look / try out the new methods added in 31c8c08. They are a first sketch of how this could work and not all corner cases are covered yet. If this works for you, I'll see to getting it right :)

codeZeilen avatar Jul 10 '20 09:07 codeZeilen

Thanks a lot for prototyping this so quickly. In general it works as we have expected it.

The prototype is based on the assumption that nodes that are next to each other in the CST are also neighbours in the original string. However we see a problem with rules that contain a * or a + (and maybe more):

grammar := OhmGrammar new: 'OhmNodeTestGrammar {
	StartRule = (";" firstRule)+
	firstRule = "a"
	space += comment
	comment = "\"" (~"\"" any)* "\""
}'.

result := (grammar
	match: ';a ; "comment" a'
	startingFrom: #StartRule) cst.

Here are parts of the CST that is produced with the code above:

If we want to retrieve the comment before the last a in the input string with

result children last children last skippedSpacesNodes

your code takes the other a from the _list-node as preceding node so the space interval ranges from 3 to 15 and also includes a ;. When we had a look at how to get the comments back before we asked you, we stumbled upon this problem as well and didn't know how to get the right nodes.

We really appreciate your effort! Thanks a lot!

felixauringer avatar Jul 11 '20 12:07 felixauringer

Ah! Thanks a lot! I suspected something like this but didn't have the time to think this through. I'll look into this case.

codeZeilen avatar Jul 11 '20 17:07 codeZeilen

Please have a look at 9b85725 :) Again it is more of a sketch as I currently do not have time to think this trough. The current approach might still miss cases but is easier to reason about (but might take a little longer on the first call due to the construction of the source map).

codeZeilen avatar Jul 13 '20 09:07 codeZeilen

When I try to send the message skippedSpacesString, I get the error MessageNotUnderstood: Array>>findFirst:startingAt: which also occurs in your tests. The message can easily be implemented but I do not think that our package would be the place for it. After implementing the message everything seems to work as expected :+1: Furthermore in the following snippet occurs an error with a node that is outside of the interval (at size + 1):

result := (OhmExplicitSendsSmalltalk
	match: 'header <pragma>.'
	startingFrom: #MethodDeclaration) cst.
spaces := result children second skippedSpacesString

This produces an error when trying to access the source map (index out of bound). It seems to be related to #43 because there are nodes of optional rules that have an interval outside the interval of method declaration: image

felixauringer avatar Jul 14 '20 10:07 felixauringer

Thanks for the PR! I will look at it Monday. Also: Sorry for not advancing this further. The week was somewhat busy. I am working on #48 currently, which resulted in a major rework of the whitespace skipping and might also help with the Optional problem (all maybe instances of skipped whitespace although the rule actually failed, but the parse succeeded as the rule was optional/negated/lookahead).

codeZeilen avatar Jul 17 '20 16:07 codeZeilen

We looked further into the problem with wrong intervals. As you can see in the following screenshot, the interval of the last node also covers whitespace characters (including comments) at the end of the method. image

This causes problems when we try to extract comments after the last node with the source map.

felixauringer avatar Jul 24 '20 07:07 felixauringer

I actually have a solution to this in my local image. The space skipping logic was inconsistent throughout nodes, as I did not completely understand the rule when to do it when I first implemented it. I now adjusted it but I did not manage to adjust all tests accordingly before the exam preparations started. I hope I can attend to it beginning of next week. It should resolve a number of issues...

codeZeilen avatar Jul 24 '20 17:07 codeZeilen

That sounds really great! We really appreciate your work for this issue :)

felixauringer avatar Jul 25 '20 09:07 felixauringer

I uploaded the intermediate state of the work on a new branch called new-whitespace-handling. You can try that one. I would advise loading it into a fresh image though... It works okay in my image so far, but not all tests are green so something is still off. I think I can attend to it Tuesday afternoon.

codeZeilen avatar Jul 27 '20 13:07 codeZeilen