language-ruby icon indicating copy to clipboard operation
language-ruby copied to clipboard

Heredoc strings no longer syntax highlighted

Open andremleblanc opened this issue 5 years ago • 15 comments

  • [x] Put an X between the brackets on this line if you have done all of the following:
    • Reproduced the problem in Safe Mode: http://flight-manual.atom.io/hacking-atom/sections/debugging/#using-safe-mode
    • Followed all applicable steps in the debugging guide: http://flight-manual.atom.io/hacking-atom/sections/debugging/
    • Checked the FAQs on the message board for common solutions: https://discuss.atom.io/c/faq
    • Checked that your issue isn't already filed: https://github.com/issues?utf8=✓&q=is%3Aissue+user%3Aatom
    • Checked that there is not already an Atom package that provides the described functionality: https://atom.io/packages

Description

Heredocs used to use the syntax highlighting for that language, but it appears that isn't currently working with Tree Sitter Parsers enabled.

Before: screen shot 2018-11-02 at 10 25 07 pm

Current: screen shot 2018-11-02 at 10 24 53 pm

Steps to Reproduce

  1. Have Tree Sitter Parsers enabled.
  2. Use heredoc syntax in Ruby grammar.

Expected behavior:

It highlights the content of the heredoc in that language's syntax.

Actual behavior:

No syntax highlighting.

Reproduces how often:

Consistently.

Additional Information:

Running on macOS Mojave.

Versions

1.32.1 1.33.0-beta1

andremleblanc avatar Nov 03 '18 02:11 andremleblanc

Thanks @andremleblanc! Reproduced on macOS 10.12.6 with 1.34.0-nightly10.

rsese avatar Nov 08 '18 22:11 rsese

The bug report contains a clause, differing from the title, that I believe makes it more specific than the problem is. Apologies if this has been tested and found to be specific in your case.

I'm using HEREDOC (tilde, though dash is the same) with Ruby syntax, as reported, however I don't have tree-sitter.

image

Also noticed that if the ending HEREDOC is typed, at the correct indentation level, after the message (as opposed to the likely more standard practice of typing opening/closing HEREDOC and populating) the syntax parser fails to match the closing HEREDOC, leaving it colored grey.

Michael-Ludgate avatar Nov 13 '18 02:11 Michael-Ludgate

I'm hitting both issues:

  • Syntax highlighting doesn't happen in a heredoc anymore
  • I can reproduce @Michael-Ludgate's issue, too (it's like it doesn't realize it's in the string), but only when I use certain delimiters. Eg it doesn't happen if I use a delimiter of HERE, or RUBY, but it does happen if I use a delimiter of HEREDOC or ABCD.

Anyway, the syntax highlighting is a huge feature for me, depending on what I'm doing, I might go to Atom just b/c of this feature (eg when I'm playing with SQL).

JoshCheek avatar Dec 28 '18 18:12 JoshCheek

Oh, that second one is interesting, it seems to be related to the letter C.

Put this into your editor, select the ABCD and do Cmd+d to highlight the second one. Then slowly type ABCD, it's highlighted as a string for AB, but then loses the highlighting on the C. If you skip C and go to D, it remains highlighted correctly. We can see the same thing in @Michael-Ludgate's example, the word HEREDOC has a C in it, and it correctly highlights as a string up until that letter is typed. If you change it to HEREDOK, it doesn't get messed up.

def m
  <<~ABCD
  msg
  ABCD
end

JoshCheek avatar Dec 28 '18 18:12 JoshCheek

Oh, okay, so it's going wonky on the highlighting because it applies "C" language syntax highlighting to it. So syntax highlighting does still get applied, but only sometimes. Maybe there's an issue with how it looks up the language? Eg here it sees HEREDOC, notices the C in there, and applies C syntax highlighting, do SQL, and it doesn't recognize it as SQL. Inspecting the DOM shows it's been given a class of syntax--string.

image

JoshCheek avatar Dec 28 '18 19:12 JoshCheek

I think this commit is part of the problem. I don't think that change does what it is presumably intended to do. Eg it looks like the author thought it was shorthand for mapping multiple keys to the same value, but instead it created a single key whose name was a conjunction of all the keys. I don't know how tree-sitter works, maybe it treats keys like this as the original, but WRT cson, the new syntax is not translated to the old syntax. Here, I convert it to JSON and we see that the key is not broken out into multiple key/value pairs:

image

JoshCheek avatar Dec 29 '18 22:12 JoshCheek

Followed the instructions to make changes locally (here to put atom into dev mode, and here to link local changes into the dev-mode). Ran a git-bisect on it.

This is the commit where it breaks.

Here's the output of the git bisect (in another window I was restarting atom each time I did this and looking to see if it would highlight SQL correctly):

🐠  git bisect start

🐠  git bisect bad

🐠  git bisect good 0909a11557725f117dea6f39aa2b08241a887ff2
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[726281fc9d9fe4e9e3e20a61f5912b062625edcb] Merge pull request #236 from atom/mb-compatible-scope-descriptors

🐠  git bisect bad
Bisecting: 14 revisions left to test after this (roughly 4 steps)
[e5a8d8e66728c013e1a009bcdb2facfd044270f9] Merge pull request #225 from atom/mb-tree-sitter-parser

🐠  git bisect bad
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[701ad5d744e23090655f1c2e725f18cfe6f4ae7c] Add tree-sitter Ruby grammar

🐠  git bisect bad
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[73ec57162fb7eb0035dedc0026911e3d2a8aee4b] :racehorse: Improve the regexp pattern

🐠  git bisect good
Bisecting: 2 revisions left to test after this (roughly 1 step)
[ef4eb5691842a369b83f6465defaad8b82ad191a] Merge pull request #215 from stevenpetryk/variables-ending-in-do-are-not-blocks

🐠  git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[d88a4cfb32876295ec5b090a2994957e62130f4a] Prepare 0.71.4 release

🐠  git bisect good
701ad5d744e23090655f1c2e725f18cfe6f4ae7c is the first bad commit
commit 701ad5d744e23090655f1c2e725f18cfe6f4ae7c
Author: Max Brunsfeld <[email protected]>
Date:   Fri Dec 8 09:28:08 2017 -0800

    Add tree-sitter Ruby grammar

:040000 040000 32ce74581b8f1777099d63e11f36f2625814dd82 adcea68aa1e140b88926e33c0f3849582360dce9 M	grammars
:100644 100644 0db247528eaa512b24d4a033c36835c164453f75 c88b06b4fee6a4c54319b0c8b1ae1107037a249a M	package.json

Additionally, I put print statements into the injection point language registration (line 6):

https://github.com/atom/language-ruby/blob/ea3824d2539fe982e69d0599c5fadd082879f1c6/lib/main.js#L4-L8

For <<~HEREDOC, it was returning the language of "HEREDOC", but for "SQL", it was returning an empty string (or possibly something that prints like an empty string).

JoshCheek avatar Dec 29 '18 23:12 JoshCheek

There are apparently multiple grammars which are loaded, and the one that is used is determined by a configuration value. This package uses this feature to test against the old parser (textmate's?), which means that the tree-sitter stuff doesn't get tested by this lib.

On the bright side, it reveals an easy solution:

atom.config.set('core.useTreeSitterParsers', false)

If you enter it into the debug console (cmd-opt-i) it will immediately update to be correct, and will persist this change in Atom's config.cson.

image

JoshCheek avatar Dec 29 '18 23:12 JoshCheek

I can reproduce this issue with Atom 1.35.1 and Ubuntu 18.04; however, https://github.com/atom/language-ruby/issues/249#issuecomment-450528828 is useful.

BerkhanBerkdemir avatar Apr 11 '19 03:04 BerkhanBerkdemir

Thanks for reporting this issue. I think I see what's causing the behavior that you're observing. I'll try to explain what I've found so far and a path toward resolving this issue.

tl;dr: When using Tree-sitter, the Ruby heredoc's inner text will be syntax highlighted if (1) there is Tree-sitter grammar for the language of the inner text and (2) that grammar provides an injection regex. Currently, JavaScript, Bash, C, and a handful of other languages meet these requirements, but SQL does not yet meet these requirements.


Tree-sitter support for syntax highlighting within Ruby heredocs

Tree-sitter refers to this topic as "language injection" and describes it as follows:

Sometimes, a source file can contain code written in several different languages. Tree-sitter grammars support this situation using a two-part process called language injection. First, an 'outer' language must define an injection point - a set of syntax nodes whose text can be parsed using a different language, along with some logic for guessing the name of the other language that should be used. Second, an 'inner' language must define an injectionRegex - a regex pattern that will be tested against the language name provided by the injection point.

The Tree-sitter Ruby grammar satisfies the first requirement described above: the grammar defines an injection point for heredocs. 😅

Some languages already support syntax highlighting within Ruby heredocs

Several languages satisfy the second requirement described above: they define an injectionRegex. For example, language-javascript defines one here, and language-shellscript defines one here. We can see that Atom provides syntax highlighting for these languages within a Ruby heredoc:

Screen Shot 2019-04-24 at 4 23 38 PM Screen Shot 2019-04-24 at 4 25 26 PM

SQL doesn't yet support syntax highlighting within Ruby heredocs

At the moment, language-sql does not yet provide a Tree-sitter grammar, so Atom cannot provide SQL syntax highlighting inside a Ruby heredoc:

Screen Shot 2019-04-24 at 4 30 30 PM

Fixing this issue

To fix this issue, we'll need to update Atom's other language-* packages to (1) provide a Tree-sitter grammar and (2) provide an injectionRegex.

A workaround in the meantime

The fix above will be a rather time consuming endeavor, so folks may want a workaround in the meantime. As @JoshCheek notes above, you can disable Tree-sitter for now. To do so, open Atom's settings UI, and uncheck the "Use Tree Sitter Parsers" checkbox.

Screen Shot 2019-04-24 at 4 39 54 PM

This will cause Atom to fall back to using the TextMate grammars, which currently provide broader support for syntax highlighting within Ruby heredocs.

jasonrudolph avatar Apr 24 '19 20:04 jasonrudolph

TL;DR there is nothing wrong with language-ruby, and this is because of language-sql. Am I right :laughing:

BerkhanBerkdemir avatar Apr 24 '19 21:04 BerkhanBerkdemir

Put this into your editor, select the ABCD and do Cmd+d to highlight the second one. Then slowly type ABCD, it's highlighted as a string for AB, but then loses the highlighting on the C. If you skip C and go to D, it remains highlighted correctly. We can see the same thing in @Michael-Ludgate's example, the word HEREDOC has a C in it, and it correctly highlights as a string up until that letter is typed. If you change it to HEREDOK, it doesn't get messed up.

Isn't that already a bug on its own or should really any HEREDOC that contains a capital C cause the content to be interpreted as C code? Sounds a bit like a =~ /C/ is done when a == "C" would have been correct.

CodingMarkus avatar Apr 25 '19 14:04 CodingMarkus

Another possibly resolution would be to support injecting legacy first-mate grammars into regions of files that are otherwise parsed via TreeSitter. This would need to occur in Atom rather than TreeSitter, and I'm not sure how complicated it would be.

nathansobo avatar Apr 25 '19 15:04 nathansobo

Maybe worth noting that "Github Markdown" highlights it correctly (LHS), but Ruby doesn't.

image

The scopes seem really different, almost seems like they don't use the same Ruby parser.

image

JoshCheek avatar Feb 16 '21 14:02 JoshCheek

Is this a dead issue? I remember this working at some point...

lacostenycoder avatar Oct 20 '21 13:10 lacostenycoder