vscode-R icon indicating copy to clipboard operation
vscode-R copied to clipboard

syntax: leading and trailing periods / dots not tokenized as part of identifier

Open kevinushey opened this issue 2 years ago • 5 comments

Given an R document with the code:

.hello.world.

The leading + trailing .s are tokenized as separate entities.

Screen Shot 2022-08-25 at 4 37 25 PM

I believe the use of \b here forces VSCode to consider . as a word boundary, and so it fails to tokenize the whole identifier as a single token:

https://github.com/REditorSupport/vscode-R/blob/da579cc17a9485f7a11bf5e75bf6b9c347eca116/syntax/r.json#L165-L168

If so, I think those \bs can be safely removed?

kevinushey avatar Aug 25 '22 23:08 kevinushey

Yes, I think you are right. We should remove those \b. Do you find unnecessary \b in other matches?

renkun-ken avatar Aug 26 '22 00:08 renkun-ken

I guess another example 42., which R parses as a single number but appears to be tokenized as 42 and .:

Screen Shot 2022-08-25 at 5 33 20 PM

https://github.com/REditorSupport/vscode-R/blob/da579cc17a9485f7a11bf5e75bf6b9c347eca116/syntax/r.json#L123-L126

And also for references to packages containing a . in their name:

Screen Shot 2022-08-25 at 5 39 10 PM

I'll poke around a bit more for other edge cases...

kevinushey avatar Aug 26 '22 00:08 kevinushey

Could one potential edge case be on S3 methods ? i.e.

as.data.frame.list()

though tbh I'm not exactly sure what you would want to tokenise here :s

gowerc avatar Sep 10 '22 18:09 gowerc

This issue is stale because it has been open for 365 days with no activity.

github-actions[bot] avatar Sep 11 '23 01:09 github-actions[bot]

unstale

eitsupi avatar Sep 11 '23 03:09 eitsupi