highlight.js
highlight.js copied to clipboard
Make handling of valid JS/TS ids/Type aliases consistent with the rest of rules
Resolves #4072
Changes
- added a simple expansion to a regex to check for valid ids containing _ and $. This makes it consistent with the rest of rules where it previously correctly highlighted const functions and class declaration, while splitting ids and TS type aliasses into highlighted and non-highlighted parts
- added markups that check already working cases and the cases that were fixed
Checklist
- [x] Added markup tests, or they don't apply here because...
- [ ] Updated the changelog at
CHANGES.md
This makes it consistent with the rest of rules where it previously correctly highlighted const functions and class declaration
Could you provide a little more color here? You're saying this worked in the recent past...?
Hi, sorry for unclear description, I meant by that "previously" that those const functions and class declarations are already highlighted correctly, and with my patch the rest of the cases are highlighted correctly as well.
I find this very non-idiomatic... I googled and looked around and the popular guides I see refer to UPPER case for constants, camelCase for other things, etc - which I'm quite familiar with... but nothing like this.
I'm not certain we want to support niche cases - so let's start there - whose idea of proper naming conventions is this? Can you show me any popular style guides or large tech companies suggesting this type of naming convention?
As I mentioned in original issue, I am working on a language that is using snake case and should cooperate with TS, and when I release the first version I would like IDs that are valid JS ids to be displayed correctly in tools that use Highlight.js as their highligher - so for instance (SO/ChatGPT) fail to highlight properly whereas Claude AI that uses different highlighter (prisma I guess) has none of these issues
From a few languages I tried this is the only language that has this problem - For instance Kotlin/Java/C++ none of these exhibit same problem and all valid IDs are treated as such and all the names are treated consistently
There is already partial support for this as mentioned in case of const functions and class declarations, my fix makes this consistent with the rest of the output
On top of that I did nothing crazy here, it is just a fix to regex that changes from regex that doesn't match valid ID names into regex that matches, there was no change of logic or anything that would break previous code, there were no test cases broken, I added all the test cases to verify the behavior and consistency of the result
At this point this is pretty much only tool that fails to match valid JavaScript ID, all IDEs, prism.js which is biggest alternative to highlight.js do not break on valid JS names
As I mentioned in original issue, I am working on a language
You may want to write a custom grammar then rather than trying to shoehorn your language into another language's grammar.
For instance Kotlin/Java/C++ none of these exhibit same problem and all valid IDs are treated as such and all the names are treated consistently
Untrue. I took a look just now and all these grammars all have very different handling of identifiers.
There are a few different contexts here we should examine, syntactical and idiomatic. When we can easily understand the syntax, we aim to highlight all valid syntax. For example this might be valid in some languages, but NOT idiomatic.
# made up example code
constant ADS&----!@---#&ADH = 32;
That's pretty awful and hopefully not idiomatic - but if we can match the entire const [ident] = such that it's valid syntactically then we'll do so - because we can be certain ADS&----!@---#&ADH is a constant.
Meanwhile if we just have a random construct like:
[1, 2, 3, ADS&----!@---#&ADH]
That MAY be a valid identifier, but if it's not idiomatic we're not going to highlight it as a constant. Generally our rule is to not highlight identifiers/variable names unless the language has a strong syntax for such things, or strong idioms. So something like $x would count as a variable in PHP (strong syntax), and something like POWER_LEVEL counts as a constant in many different languages - not by syntax, but by idiom.
This is an opinion of the library, not everyone likes it. (that we care strongly about idioms and factor them into our rules)
There is already partial support for this as mentioned in case of const functions and class declarations
Yes, see above this is fine - because it's 100% clear syntactically in those cases.
prism.js which is biggest alternative to highlight.js do not break on valid JS names
Sure it does (using your own examples). A quick check of both JS and TS reveals it's far from consistent.
only tool that fails to match valid JavaScript ID, all IDEs, prism.js
This is not a stated goal of the project - to match anyone else example by example. If you prefer Prism better, use that.
If you're interested in improving obviously syntactical context cases (in any of our grammars) that's a conversation worth having (though I'd file it as a dup of #2500)... but we'd be open to improvements - cases like const or class etc - if we don't have those correct in various languages, happy to fix them. Though even there idioms matter, ie:
const x = 3; // x is `variable.constant`
const FastCar = "Ferrari"; // FastCar is `title.class` (by idiom)
But we're not interested in highlighting random identifiers that are not following the common/popular idiomatic conventions for a given language - that would include the very oddly named JS/TS variables you're wanting us to support.
Closing this as a #wontfix. We'll support only the idiomatic CamelCase naming (for auto-highlighting).