pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

MEGA-ISSUE: Syntax highlighting in `language-c` (C/C++)

Open savetheclocktower opened this issue 1 year ago • 48 comments

IMPORTANT: Some issues have already been fixed!

If you’re still on the regular 1.113 release, you might be suffering from a problem that has already been fixed. Many fixes landed on master in #859. You are encouraged to download the latest rolling release — both (a) to see whether what you’re reporting has been fixed, and (b) so that you can enjoy the fixes that have been reported since the 1.113 release.


This will serve as a catch-all issue for any problems you may encounter with syntax highlighting in C or C++ files (language-c). If you have problems with the new syntax highlighting (or folds, or indents) that are specific to c, cpp, or related files (h, cc, etc.), keep reading.

Something isn't highlighting correctly!

If you’ve found any highlighting regressions since 1.113:

First, please scroll down and see if someone else has reported the issue. If so, then you need only sit back and wait for a fix — most issues will be fixed in version 1.114!

If not, please comment with

  • A small amount of sample code to reproduce the issue
  • Screenshots (before and after would be ideal, but just the “after” is OK for obvious things

I want to go back to the old highlighting!

You can easily opt out of the new Tree-sitter highlighting for any language, but first please:

  • subscribe to this issue so you'll know when the problem is fixed
  • remove the config change when the next release comes out so that you can enjoy the new grammar once again

To revert to the old Tree-sitter grammar for this language only, add the following to your config.cson:

".c.source, .cpp.source":
  core:
    useLegacyTreeSitter: true

To revert to the TextMate-style grammar for this language only, add the following to your config.cson:

".c.source, .cpp.source":
  core:
    useTreeSitterParsers: false

savetheclocktower avatar Jan 19 '24 00:01 savetheclocktower

~~@Tamaranch has identified some changes in syntax highlighting colors in the new grammar in #872. I'll be assessing these to see which ones were inadvertent and will report back to this thread.~~ Edit: See below.

In general, if you're using a built-in syntax theme, we intend for the new grammars to make your code look similar to how it looked before.

If you're using a community syntax theme, we can't make any such guarantees, but we'll tell you exactly how to apply the syntax highlighting you expect via overrides in your user stylesheet.

savetheclocktower avatar Jan 19 '24 00:01 savetheclocktower

Using this file as a reference and testing with the built-in One Dark theme, I can attest that most of the syntax highlighting differences I saw were not intentional choices, and the ones that were intentional were not strongly held.

  • I had scoped struct members as entity.other.attribute-name.c on declaration to try to harmonize them with JavaScript object keys (which bear a similar scope when they're declared) but variable.other.member.c is just as arguable, and involves no disruption.
  • I noticed that preprocessor keywords were missing a punctuation declaration for their leading #, so that got added. In many themes that won't have an effect, but in One Dark it'll make them look less like the old Tree-sitter grammar and more like the TextMate grammar.
  • I noticed the presence of meta.block.c scopes in the TextMate grammar. Most meta scopes from TextMate grammars are not very useful, but I've found meta.block scopes useful in the JavaScript grammar, so I've added some to the C/C++ grammars as well.

The changes have been made to both C and C++ and are described in this commit that is now part of #859.

savetheclocktower avatar Jan 19 '24 02:01 savetheclocktower

FWIW things appear more broken with the built-in Base16 Tomorrow Dark theme (for example user-defined types are no longer highlighted).

Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)? EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries.

Tamaranch avatar Jan 19 '24 14:01 Tamaranch

Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)?

You can set ATOM_DEV_RESOURCE_PATH as described here (just use pulsar instead of atom in your command) and launch in dev mode. But I don't expect everyone to do that, so I hope we'll be able to land #859 soon just to get a rolling release out the door.

EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries.

Aha! I always forget about those. Thank god some people understand CI better than I do.

savetheclocktower avatar Jan 19 '24 17:01 savetheclocktower

Is #859 supposed to fix the other themes too, or are you only taking care of One Dark at first?

Also, with One Dark, is highlighting variables when declaring or assigning them expected?

Tamaranch avatar Jan 19 '24 18:01 Tamaranch

It's supposed to fix the other built-in themes. I admit I only fixed One Dark and One Light earlier, but I'll make a note to revisit the others.

Also, with One Dark, is highlighting variables when declaring or assigning them expected?

Yes. Let me know if that isn't the case.

Long version: In C-style languages without a sigil (like $ or @) to indicate variables, there is no obvious standard for what a variable is. If you go maximal and say that any arbitrary identifier is a variable, then everything looks the same color. If it's all going to be the same color, then syntax highlighting doesn't add any value over not adding scopes to identifiers at all.

My way of resolving this was to say that variables are scoped as variable when they're declared, assigned, or reassigned. That is how it should be with all C-style languages. There might be other contexts where I allowed a pre-existing variable scope in for the sake of continuity, but I'm trying my best to avoid these.

So — not just for @Tamaranch, but also for other readers of this issue:

  • If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.
  • If you want something to look like a variable when it isn't being declared or assigned, I'm happy to hear you out — especially if you can define an enforceable rule for what gets to be a variable that doesn't result in all identifiers being the same color.

savetheclocktower avatar Jan 19 '24 18:01 savetheclocktower

If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.

Testing the latest #859 for the various built-in themes, I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?

Tamaranch avatar Jan 20 '24 15:01 Tamaranch

I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?

I'm not sure why they made that decision. variable is colored red; variable.parameter is explicitly colored like ordinary text.

All the syntax themes are due for revisions, but that's a large project. I'm not sure that One Dark should be the default, but if I were to change it now, experience tells me I'd need to offer a concession for folks who liked the way it already was. My instinct is that we'd evolve those themes but offer the originals as optional packages.

savetheclocktower avatar Jan 20 '24 18:01 savetheclocktower

This could be a bug: structure members are not highlighted in this definition (whatever the theme):

static const struct wl_registry_listener registry_listener =
{
  .global = registry_global,
  .global_remove = registry_global_remove,
};

Tamaranch avatar Jan 20 '24 19:01 Tamaranch

This could be a bug: structure members are not highlighted in this definition (whatever the theme):

Fixed in this commit. Thanks again for such great feedback!

savetheclocktower avatar Jan 20 '24 20:01 savetheclocktower

After some agonizing and vacillating, I just made a change that affects C/C++: rather than draw a confusing distinction between built-in value types like int and bool (which TextMate would have us scope as storage.type) and library-provided types (which TextMate wants to classify as support.type), I've decided to adopt the compromise that legacy Tree-sitter seemed to make: all value types get scoped as support.storage.type instead. This allows them to live in the support namespace while also triggering storage styles in most syntax themes. User-defined types will also be scoped under support, albeit as support.other.storage.type.

storage.type is now reserved for language constructs — function, class, struct, plus variable declaration tokens like let and const in JavaScript. The idea that storage.type originally was used to mark both (e.g.) typedef and int makes it obvious that a C developer came up with the system.

Since this new convention is exactly how legacy Tree-sitter did things in C/C++, this should only reduce the highlighting differences between legacy modern Tree-sitter grammars. But let me know if you notice any unusual side-effects.

savetheclocktower avatar Jan 21 '24 21:01 savetheclocktower

Not sure if it's only since the latest changes, but pointer of pointer declarations (such as char **strv; and beyond) are not highlighted (both variables and structure members, and it seems theme-independent).

Tamaranch avatar Jan 22 '24 20:01 Tamaranch

And you're on a recent build? That's odd — I added support for that a few days ago.

For reference, if you're unsure if something is theme-dependent or not, run the Editor: Log Cursor Scope command with your cursor inside the token in question. In this example…

int main (char **strv;) {
  
}

strv should be variable.parameter.c. If source.js is the only scope listed, that's a strong sign that something got missed.

savetheclocktower avatar Jan 22 '24 20:01 savetheclocktower

Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).

Tamaranch avatar Jan 22 '24 21:01 Tamaranch

If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.

Not sure it should count, but I note in passing that array item assignments are not highlighted:

strv[n] = "str";

Tamaranch avatar Jan 22 '24 21:01 Tamaranch

Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).

OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?

savetheclocktower avatar Jan 22 '24 21:01 savetheclocktower

If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.

Not sure it should count, but I note in passing that array item assignments are not highlighted:

strv[n] = "str";

Yeah, this one isn't a regression, but you can certainly make the argument. Not a major priority right now, but it's worth remembering.

savetheclocktower avatar Jan 22 '24 21:01 savetheclocktower

OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?

struct {
  char **strv;
};

void f (void) {
  char **strv = NULL;
  char **strv1;
}

Tamaranch avatar Jan 22 '24 21:01 Tamaranch

Pointer-of-pointer variable declarations are addressed in this commit.

savetheclocktower avatar Jan 23 '24 01:01 savetheclocktower

It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros. This includes functions and variables as well as actual macros. It doesn't seem to affect the case when a function is being called, only when it is being declared or defined. See the screenshot below with a minimal example of the behavior for different combinations of types.

Previously, I believe it only ever highlighted items as macros if the entire word was upper case. Interestingly, both systems don't identify an all-caps macro definition as a macro if it doesn't have a specific value after the name. Also, ideally it would be nice if macros were determined by checking if there was a #define before the name, which would clearly distinguish them from variable definitions and function definitions, and then if it used that to determine which items are macros further down in the file too (not just checking for all caps), but I'm sure that's not in the scope of this issue, since even the old system didn't do that.

New behavior (tested on 1.113.0 and rolling release 1.113.2024020202) : image

Legacy tree-sitter: image

SirTomofAto avatar Feb 03 '24 18:02 SirTomofAto

@SirTomofAto Yep, that is some weird behavior there. I'll look into it. Thanks for the report!

savetheclocktower avatar Feb 04 '24 06:02 savetheclocktower

@SirTomofAto Addressed in this commit in #906. The ALL_CAPS heuristic is here to stay when used outside of a preprocessor block, but we can certainly scope the foo in #define foo as a constant. Read the commit message for more details.

Screenshot 2024-02-03 at 11 25 51 PM

Thanks again for reporting this!

savetheclocktower avatar Feb 04 '24 07:02 savetheclocktower

Great! Thanks for the quick response. And yeah, I completely get the need for ALL_CAPS to be the heuristic here.

SirTomofAto avatar Feb 04 '24 14:02 SirTomofAto

Addressed in this commit in https://github.com/pulsar-edit/pulsar/pull/906.

It seems that this commit also fixes a bug I was about to report, concerning the highlighting of function parameters whose name ends in _[0-9]*, e.g.

void f (int p_, int p_1, int p_12);

It's reproducible with rolling release Pulsar-1.113.2024020202.AppImage, but not with Pulsar-1.113.2024020407.AppImage from #906. As the link isn't obvious to me from the commit message, I'd prefer to mention it, in case it's an unwanted side-effect (even though it fixes this particular bug).

Tamaranch avatar Feb 04 '24 17:02 Tamaranch

Ha! It is an unwanted side effect. My pattern for detecting ALL_CAPS_CONSTANTS was missing a ^ anchor, so anything that ended with an underscore was being incorrectly flag as a potential constant.

This stopped happening for parameters because the recent changes stopped applying the constant scope to anything that had already been assigned at least one other scope. But the faulty pattern would've manifested in other scenarios, like marking foo_ here as a constant:

int table[foo_];

I'll push a fix for that, too. So glad you mentioned it!

EDIT: Here's the commit.

savetheclocktower avatar Feb 04 '24 21:02 savetheclocktower

It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros.

Looks like it's still an issue in C++ (pulsar 1.114.0), see e.g. https://gitlab.xfce.org/panel-plugins/xfce4-docklike-plugin/-/blob/fcbe8325871adea32cede57212ff50977a3be156/src/Wnck.cpp#L98

Tamaranch avatar Feb 19 '24 16:02 Tamaranch

Oof! I try pretty hard to keep things in sync between the two grammars. I'll take a look.

savetheclocktower avatar Feb 19 '24 17:02 savetheclocktower

Goto labels are no longer highlighted, either when "calling" them via goto or where they are located:

void f (void) {
  /* some code */
  goto cleanup;
  /* some code */
cleanup:
  /* cleanup */
}

Tamaranch avatar Feb 26 '24 17:02 Tamaranch

I've encountered a strange crash that only happens when using the new tree-sitter implementation, and only occurs for me when on Linux, not Windows (I typically use Pulsar within an Ubuntu VM development environment).

int main(void) {

    // Ether of these immediately crash Pulsar 1.114 in Ubuntu, but not in Windows
    function(2 * (var) * (var2));
    function(2 * (MACRO) * (MACRO));

    // None of these crash
    function(2 * (3) * (4));
    function(2 * var * var2);
    function((2) * (var) * (var2));
    function(2 * var * (var2));
    function(2 * (var) * var2);
    function(MACRO * (var) * (var2));
    function(var * (var2) * (var3));
}

// Also crashes immediately in Ubuntu, but not in Windows
function(2 * (var) * (var2)) {}

The crash happens consistently when there is a number and then two non-numbers, with the non-numbers encapsulated in parenthesis.

SirTomofAto avatar Mar 06 '24 19:03 SirTomofAto

I can reproduce this. The symptoms are identical to those I got when I tried to build a .wasm file from the latest tree-sitter-bash recently. I'll have to dig into this.

Thanks for the report!

savetheclocktower avatar Mar 06 '24 20:03 savetheclocktower