pulsar
pulsar copied to clipboard
MEGA-ISSUE: Syntax highlighting in `language-c` (C/C++)
IMPORTANT: Some issues have already been fixed!
If you’re still on the regular 1.113 release, you might be suffering from a problem that has already been fixed. Many fixes landed on master in #859. You are encouraged to download the latest rolling release — both (a) to see whether what you’re reporting has been fixed, and (b) so that you can enjoy the fixes that have been reported since the 1.113 release.
This will serve as a catch-all issue for any problems you may encounter with syntax highlighting in C or C++ files (language-c). If you have problems with the new syntax highlighting (or folds, or indents) that are specific to c, cpp, or related files (h, cc, etc.), keep reading.
Something isn't highlighting correctly!
If you’ve found any highlighting regressions since 1.113:
First, please scroll down and see if someone else has reported the issue. If so, then you need only sit back and wait for a fix — most issues will be fixed in version 1.114!
If not, please comment with
- A small amount of sample code to reproduce the issue
- Screenshots (before and after would be ideal, but just the “after” is OK for obvious things
I want to go back to the old highlighting!
You can easily opt out of the new Tree-sitter highlighting for any language, but first please:
- subscribe to this issue so you'll know when the problem is fixed
- remove the config change when the next release comes out so that you can enjoy the new grammar once again
To revert to the old Tree-sitter grammar for this language only, add the following to your config.cson:
".c.source, .cpp.source":
core:
useLegacyTreeSitter: true
To revert to the TextMate-style grammar for this language only, add the following to your config.cson:
".c.source, .cpp.source":
core:
useTreeSitterParsers: false
~~@Tamaranch has identified some changes in syntax highlighting colors in the new grammar in #872. I'll be assessing these to see which ones were inadvertent and will report back to this thread.~~ Edit: See below.
In general, if you're using a built-in syntax theme, we intend for the new grammars to make your code look similar to how it looked before.
If you're using a community syntax theme, we can't make any such guarantees, but we'll tell you exactly how to apply the syntax highlighting you expect via overrides in your user stylesheet.
Using this file as a reference and testing with the built-in One Dark theme, I can attest that most of the syntax highlighting differences I saw were not intentional choices, and the ones that were intentional were not strongly held.
- I had scoped struct members as
entity.other.attribute-name.con declaration to try to harmonize them with JavaScript object keys (which bear a similar scope when they're declared) butvariable.other.member.cis just as arguable, and involves no disruption. - I noticed that preprocessor keywords were missing a
punctuationdeclaration for their leading#, so that got added. In many themes that won't have an effect, but in One Dark it'll make them look less like the old Tree-sitter grammar and more like the TextMate grammar. - I noticed the presence of
meta.block.cscopes in the TextMate grammar. Mostmetascopes from TextMate grammars are not very useful, but I've foundmeta.blockscopes useful in the JavaScript grammar, so I've added some to the C/C++ grammars as well.
The changes have been made to both C and C++ and are described in this commit that is now part of #859.
FWIW things appear more broken with the built-in Base16 Tomorrow Dark theme (for example user-defined types are no longer highlighted).
Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)?
EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries.
Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)?
You can set ATOM_DEV_RESOURCE_PATH as described here (just use pulsar instead of atom in your command) and launch in dev mode. But I don't expect everyone to do that, so I hope we'll be able to land #859 soon just to get a rolling release out the door.
EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries.
Aha! I always forget about those. Thank god some people understand CI better than I do.
Is #859 supposed to fix the other themes too, or are you only taking care of One Dark at first?
Also, with One Dark, is highlighting variables when declaring or assigning them expected?
It's supposed to fix the other built-in themes. I admit I only fixed One Dark and One Light earlier, but I'll make a note to revisit the others.
Also, with One Dark, is highlighting variables when declaring or assigning them expected?
Yes. Let me know if that isn't the case.
Long version: In C-style languages without a sigil (like $ or @) to indicate variables, there is no obvious standard for what a variable is. If you go maximal and say that any arbitrary identifier is a variable, then everything looks the same color. If it's all going to be the same color, then syntax highlighting doesn't add any value over not adding scopes to identifiers at all.
My way of resolving this was to say that variables are scoped as variable when they're declared, assigned, or reassigned. That is how it should be with all C-style languages. There might be other contexts where I allowed a pre-existing variable scope in for the sake of continuity, but I'm trying my best to avoid these.
So — not just for @Tamaranch, but also for other readers of this issue:
- If you find a place where a variable is not scoped as
variablewhen being declared or assigned, then that is a bug. - If you want something to look like a variable when it isn't being declared or assigned, I'm happy to hear you out — especially if you can define an enforceable rule for what gets to be a variable that doesn't result in all identifiers being the same color.
If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.
Testing the latest #859 for the various built-in themes, I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?
I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?
I'm not sure why they made that decision. variable is colored red; variable.parameter is explicitly colored like ordinary text.
All the syntax themes are due for revisions, but that's a large project. I'm not sure that One Dark should be the default, but if I were to change it now, experience tells me I'd need to offer a concession for folks who liked the way it already was. My instinct is that we'd evolve those themes but offer the originals as optional packages.
This could be a bug: structure members are not highlighted in this definition (whatever the theme):
static const struct wl_registry_listener registry_listener =
{
.global = registry_global,
.global_remove = registry_global_remove,
};
This could be a bug: structure members are not highlighted in this definition (whatever the theme):
Fixed in this commit. Thanks again for such great feedback!
After some agonizing and vacillating, I just made a change that affects C/C++: rather than draw a confusing distinction between built-in value types like int and bool (which TextMate would have us scope as storage.type) and library-provided types (which TextMate wants to classify as support.type), I've decided to adopt the compromise that legacy Tree-sitter seemed to make: all value types get scoped as support.storage.type instead. This allows them to live in the support namespace while also triggering storage styles in most syntax themes. User-defined types will also be scoped under support, albeit as support.other.storage.type.
storage.type is now reserved for language constructs — function, class, struct, plus variable declaration tokens like let and const in JavaScript. The idea that storage.type originally was used to mark both (e.g.) typedef and int makes it obvious that a C developer came up with the system.
Since this new convention is exactly how legacy Tree-sitter did things in C/C++, this should only reduce the highlighting differences between legacy modern Tree-sitter grammars. But let me know if you notice any unusual side-effects.
Not sure if it's only since the latest changes, but pointer of pointer declarations (such as char **strv; and beyond) are not highlighted (both variables and structure members, and it seems theme-independent).
And you're on a recent build? That's odd — I added support for that a few days ago.
For reference, if you're unsure if something is theme-dependent or not, run the Editor: Log Cursor Scope command with your cursor inside the token in question. In this example…
int main (char **strv;) {
}
…strv should be variable.parameter.c. If source.js is the only scope listed, that's a strong sign that something got missed.
Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).
If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.
Not sure it should count, but I note in passing that array item assignments are not highlighted:
strv[n] = "str";
Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).
OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?
If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.
Not sure it should count, but I note in passing that array item assignments are not highlighted:
strv[n] = "str";
Yeah, this one isn't a regression, but you can certainly make the argument. Not a major priority right now, but it's worth remembering.
OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?
struct {
char **strv;
};
void f (void) {
char **strv = NULL;
char **strv1;
}
Pointer-of-pointer variable declarations are addressed in this commit.
It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros. This includes functions and variables as well as actual macros. It doesn't seem to affect the case when a function is being called, only when it is being declared or defined. See the screenshot below with a minimal example of the behavior for different combinations of types.
Previously, I believe it only ever highlighted items as macros if the entire word was upper case. Interestingly, both systems don't identify an all-caps macro definition as a macro if it doesn't have a specific value after the name. Also, ideally it would be nice if macros were determined by checking if there was a #define before the name, which would clearly distinguish them from variable definitions and function definitions, and then if it used that to determine which items are macros further down in the file too (not just checking for all caps), but I'm sure that's not in the scope of this issue, since even the old system didn't do that.
New behavior (tested on 1.113.0 and rolling release 1.113.2024020202) :
Legacy tree-sitter:
@SirTomofAto Yep, that is some weird behavior there. I'll look into it. Thanks for the report!
@SirTomofAto Addressed in this commit in #906. The ALL_CAPS heuristic is here to stay when used outside of a preprocessor block, but we can certainly scope the foo in #define foo as a constant. Read the commit message for more details.
Thanks again for reporting this!
Great! Thanks for the quick response. And yeah, I completely get the need for ALL_CAPS to be the heuristic here.
Addressed in this commit in https://github.com/pulsar-edit/pulsar/pull/906.
It seems that this commit also fixes a bug I was about to report, concerning the highlighting of function parameters whose name ends in _[0-9]*, e.g.
void f (int p_, int p_1, int p_12);
It's reproducible with rolling release Pulsar-1.113.2024020202.AppImage, but not with Pulsar-1.113.2024020407.AppImage from #906. As the link isn't obvious to me from the commit message, I'd prefer to mention it, in case it's an unwanted side-effect (even though it fixes this particular bug).
Ha! It is an unwanted side effect. My pattern for detecting ALL_CAPS_CONSTANTS was missing a ^ anchor, so anything that ended with an underscore was being incorrectly flag as a potential constant.
This stopped happening for parameters because the recent changes stopped applying the constant scope to anything that had already been assigned at least one other scope. But the faulty pattern would've manifested in other scenarios, like marking foo_ here as a constant:
int table[foo_];
I'll push a fix for that, too. So glad you mentioned it!
EDIT: Here's the commit.
It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros.
Looks like it's still an issue in C++ (pulsar 1.114.0), see e.g. https://gitlab.xfce.org/panel-plugins/xfce4-docklike-plugin/-/blob/fcbe8325871adea32cede57212ff50977a3be156/src/Wnck.cpp#L98
Oof! I try pretty hard to keep things in sync between the two grammars. I'll take a look.
Goto labels are no longer highlighted, either when "calling" them via goto or where they are located:
void f (void) {
/* some code */
goto cleanup;
/* some code */
cleanup:
/* cleanup */
}
I've encountered a strange crash that only happens when using the new tree-sitter implementation, and only occurs for me when on Linux, not Windows (I typically use Pulsar within an Ubuntu VM development environment).
int main(void) {
// Ether of these immediately crash Pulsar 1.114 in Ubuntu, but not in Windows
function(2 * (var) * (var2));
function(2 * (MACRO) * (MACRO));
// None of these crash
function(2 * (3) * (4));
function(2 * var * var2);
function((2) * (var) * (var2));
function(2 * var * (var2));
function(2 * (var) * var2);
function(MACRO * (var) * (var2));
function(var * (var2) * (var3));
}
// Also crashes immediately in Ubuntu, but not in Windows
function(2 * (var) * (var2)) {}
The crash happens consistently when there is a number and then two non-numbers, with the non-numbers encapsulated in parenthesis.
I can reproduce this. The symptoms are identical to those I got when I tried to build a .wasm file from the latest tree-sitter-bash recently. I'll have to dig into this.
Thanks for the report!