language-ruby
language-ruby copied to clipboard
Heredoc strings no longer syntax highlighted
- [x] Put an X between the brackets on this line if you have done all of the following:
- Reproduced the problem in Safe Mode: http://flight-manual.atom.io/hacking-atom/sections/debugging/#using-safe-mode
- Followed all applicable steps in the debugging guide: http://flight-manual.atom.io/hacking-atom/sections/debugging/
- Checked the FAQs on the message board for common solutions: https://discuss.atom.io/c/faq
- Checked that your issue isn't already filed: https://github.com/issues?utf8=✓&q=is%3Aissue+user%3Aatom
- Checked that there is not already an Atom package that provides the described functionality: https://atom.io/packages
Description
Heredocs used to use the syntax highlighting for that language, but it appears that isn't currently working with Tree Sitter Parsers enabled.
Before:
Current:
Steps to Reproduce
- Have Tree Sitter Parsers enabled.
- Use heredoc syntax in Ruby grammar.
Expected behavior:
It highlights the content of the heredoc in that language's syntax.
Actual behavior:
No syntax highlighting.
Reproduces how often:
Consistently.
Additional Information:
Running on macOS Mojave.
Versions
1.32.1 1.33.0-beta1
Thanks @andremleblanc! Reproduced on macOS 10.12.6 with 1.34.0-nightly10.
The bug report contains a clause, differing from the title, that I believe makes it more specific than the problem is. Apologies if this has been tested and found to be specific in your case.
I'm using HEREDOC (tilde, though dash is the same) with Ruby syntax, as reported, however I don't have tree-sitter.
Also noticed that if the ending HEREDOC is typed, at the correct indentation level, after the message (as opposed to the likely more standard practice of typing opening/closing HEREDOC and populating) the syntax parser fails to match the closing HEREDOC, leaving it colored grey.
I'm hitting both issues:
- Syntax highlighting doesn't happen in a heredoc anymore
- I can reproduce @Michael-Ludgate's issue, too (it's like it doesn't realize it's in the string), but only when I use certain delimiters. Eg it doesn't happen if I use a delimiter of
HERE
, orRUBY
, but it does happen if I use a delimiter ofHEREDOC
orABCD
.
Anyway, the syntax highlighting is a huge feature for me, depending on what I'm doing, I might go to Atom just b/c of this feature (eg when I'm playing with SQL).
Oh, that second one is interesting, it seems to be related to the letter C
.
Put this into your editor, select the ABCD
and do Cmd+d
to highlight the second one. Then slowly type ABCD
, it's highlighted as a string for AB
, but then loses the highlighting on the C
. If you skip C
and go to D
, it remains highlighted correctly. We can see the same thing in @Michael-Ludgate's example, the word HEREDOC
has a C
in it, and it correctly highlights as a string up until that letter is typed. If you change it to HEREDOK
, it doesn't get messed up.
def m
<<~ABCD
msg
ABCD
end
Oh, okay, so it's going wonky on the highlighting because it applies "C" language syntax highlighting to it. So syntax highlighting does still get applied, but only sometimes. Maybe there's an issue with how it looks up the language? Eg here it sees HEREDOC
, notices the C
in there, and applies C syntax highlighting, do SQL
, and it doesn't recognize it as SQL. Inspecting the DOM shows it's been given a class of syntax--string
.
I think this commit is part of the problem. I don't think that change does what it is presumably intended to do. Eg it looks like the author thought it was shorthand for mapping multiple keys to the same value, but instead it created a single key whose name was a conjunction of all the keys. I don't know how tree-sitter works, maybe it treats keys like this as the original, but WRT cson, the new syntax is not translated to the old syntax. Here, I convert it to JSON and we see that the key is not broken out into multiple key/value pairs:
Followed the instructions to make changes locally (here to put atom into dev mode, and here to link local changes into the dev-mode). Ran a git-bisect
on it.
This is the commit where it breaks.
Here's the output of the git bisect
(in another window I was restarting atom each time I did this and looking to see if it would highlight SQL correctly):
🐠 git bisect start
🐠 git bisect bad
🐠 git bisect good 0909a11557725f117dea6f39aa2b08241a887ff2
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[726281fc9d9fe4e9e3e20a61f5912b062625edcb] Merge pull request #236 from atom/mb-compatible-scope-descriptors
🐠 git bisect bad
Bisecting: 14 revisions left to test after this (roughly 4 steps)
[e5a8d8e66728c013e1a009bcdb2facfd044270f9] Merge pull request #225 from atom/mb-tree-sitter-parser
🐠 git bisect bad
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[701ad5d744e23090655f1c2e725f18cfe6f4ae7c] Add tree-sitter Ruby grammar
🐠 git bisect bad
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[73ec57162fb7eb0035dedc0026911e3d2a8aee4b] :racehorse: Improve the regexp pattern
🐠 git bisect good
Bisecting: 2 revisions left to test after this (roughly 1 step)
[ef4eb5691842a369b83f6465defaad8b82ad191a] Merge pull request #215 from stevenpetryk/variables-ending-in-do-are-not-blocks
🐠 git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[d88a4cfb32876295ec5b090a2994957e62130f4a] Prepare 0.71.4 release
🐠 git bisect good
701ad5d744e23090655f1c2e725f18cfe6f4ae7c is the first bad commit
commit 701ad5d744e23090655f1c2e725f18cfe6f4ae7c
Author: Max Brunsfeld <[email protected]>
Date: Fri Dec 8 09:28:08 2017 -0800
Add tree-sitter Ruby grammar
:040000 040000 32ce74581b8f1777099d63e11f36f2625814dd82 adcea68aa1e140b88926e33c0f3849582360dce9 M grammars
:100644 100644 0db247528eaa512b24d4a033c36835c164453f75 c88b06b4fee6a4c54319b0c8b1ae1107037a249a M package.json
Additionally, I put print statements into the injection point language registration (line 6):
https://github.com/atom/language-ruby/blob/ea3824d2539fe982e69d0599c5fadd082879f1c6/lib/main.js#L4-L8
For <<~HEREDOC
, it was returning the language of "HEREDOC"
, but for "SQL"
, it was returning an empty string (or possibly something that prints like an empty string).
There are apparently multiple grammars which are loaded, and the one that is used is determined by a configuration value. This package uses this feature to test against the old parser (textmate's?), which means that the tree-sitter stuff doesn't get tested by this lib.
On the bright side, it reveals an easy solution:
atom.config.set('core.useTreeSitterParsers', false)
If you enter it into the debug console (cmd-opt-i
) it will immediately update to be correct, and will persist this change in Atom's config.cson
.
I can reproduce this issue with Atom 1.35.1 and Ubuntu 18.04; however, https://github.com/atom/language-ruby/issues/249#issuecomment-450528828 is useful.
Thanks for reporting this issue. I think I see what's causing the behavior that you're observing. I'll try to explain what I've found so far and a path toward resolving this issue.
tl;dr: When using Tree-sitter, the Ruby heredoc's inner text will be syntax highlighted if (1) there is Tree-sitter grammar for the language of the inner text and (2) that grammar provides an injection regex. Currently, JavaScript, Bash, C, and a handful of other languages meet these requirements, but SQL does not yet meet these requirements.
Tree-sitter support for syntax highlighting within Ruby heredocs
Tree-sitter refers to this topic as "language injection" and describes it as follows:
Sometimes, a source file can contain code written in several different languages. Tree-sitter grammars support this situation using a two-part process called language injection. First, an 'outer' language must define an injection point - a set of syntax nodes whose text can be parsed using a different language, along with some logic for guessing the name of the other language that should be used. Second, an 'inner' language must define an
injectionRegex
- a regex pattern that will be tested against the language name provided by the injection point.
The Tree-sitter Ruby grammar satisfies the first requirement described above: the grammar defines an injection point for heredocs. 😅
Some languages already support syntax highlighting within Ruby heredocs
Several languages satisfy the second requirement described above: they define an injectionRegex
. For example, language-javascript defines one here, and language-shellscript defines one here. We can see that Atom provides syntax highlighting for these languages within a Ruby heredoc:
![Screen Shot 2019-04-24 at 4 23 38 PM](https://user-images.githubusercontent.com/2988/56692595-8b5e4880-66b0-11e9-9838-7eb2ae020029.png)
![Screen Shot 2019-04-24 at 4 25 26 PM](https://user-images.githubusercontent.com/2988/56692596-8bf6df00-66b0-11e9-9d4c-7f733201b0d5.png)
SQL doesn't yet support syntax highlighting within Ruby heredocs
At the moment, language-sql does not yet provide a Tree-sitter grammar, so Atom cannot provide SQL syntax highlighting inside a Ruby heredoc:
![Screen Shot 2019-04-24 at 4 30 30 PM](https://user-images.githubusercontent.com/2988/56692598-8bf6df00-66b0-11e9-9e35-1f3020a78e7e.png)
Fixing this issue
To fix this issue, we'll need to update Atom's other language-*
packages to (1) provide a Tree-sitter grammar and (2) provide an injectionRegex
.
A workaround in the meantime
The fix above will be a rather time consuming endeavor, so folks may want a workaround in the meantime. As @JoshCheek notes above, you can disable Tree-sitter for now. To do so, open Atom's settings UI, and uncheck the "Use Tree Sitter Parsers" checkbox.
![Screen Shot 2019-04-24 at 4 39 54 PM](https://user-images.githubusercontent.com/2988/56692599-8bf6df00-66b0-11e9-8d3d-3166ac98f0cd.png)
This will cause Atom to fall back to using the TextMate grammars, which currently provide broader support for syntax highlighting within Ruby heredocs.
TL;DR there is nothing wrong with language-ruby
, and this is because of language-sql
. Am I right :laughing:
Put this into your editor, select the
ABCD
and doCmd+d
to highlight the second one. Then slowly typeABCD
, it's highlighted as a string forAB
, but then loses the highlighting on theC
. If you skipC
and go toD
, it remains highlighted correctly. We can see the same thing in @Michael-Ludgate's example, the wordHEREDOC
has aC
in it, and it correctly highlights as a string up until that letter is typed. If you change it toHEREDOK
, it doesn't get messed up.
Isn't that already a bug on its own or should really any HEREDOC that contains a capital C cause the content to be interpreted as C code? Sounds a bit like a =~ /C/
is done when a == "C"
would have been correct.
Another possibly resolution would be to support injecting legacy first-mate grammars into regions of files that are otherwise parsed via TreeSitter. This would need to occur in Atom rather than TreeSitter, and I'm not sure how complicated it would be.
Maybe worth noting that "Github Markdown" highlights it correctly (LHS), but Ruby doesn't.
The scopes seem really different, almost seems like they don't use the same Ruby parser.
Is this a dead issue? I remember this working at some point...