highlight.js icon indicating copy to clipboard operation
highlight.js copied to clipboard

Improve highlighting consistency by special-casing common types

Open krisvanrens opened this issue 4 years ago • 9 comments

@klmr @joshgoebel

Follows up https://github.com/highlightjs/highlight.js/pull/3178

Description

In the current version of the C++ highlighting set is a regular expression rule called CPP_PRIMITIVE_TYPES that highlights POSIX-defined special types. However, this does not hold for C++ and leads to false positives in highlighting. Especially the _t convention for alias template helpers for type traits come to mind (e.g. std::decay_t<...> et al).

There is, however, a set of common language-defined types that for a large part have the _t suffix. These types should be highlighted and could be considered primitive types.

In this PR I propose to change the regular expression to match suffix _t for any type to be changed to a list of selected types for highlighting. It is the intent for this list to remain as minimal as possible, perhaps this should be clarified in a comment.

Changes

  • Change CPP_PRIMITIVE_TYPES to a list of selected types for highlighting.

Checklist

  • [x] Checked build and test results + updated test results
  • [ ] Updated the changelog at CHANGES.md

krisvanrens avatar Jun 08 '21 21:06 krisvanrens

What does the _t supposed to symbolize anyways? I always thought it was "type", hence this made a lot of sense...

joshgoebel avatar Jun 08 '21 21:06 joshgoebel

What does the _t supposed to symbolize anyways? I always thought it was "type", hence this made a lot of sense...

Yeah it does represent 'type', but it is a POSIX convention that does not hold for C++. It does sort of hold for C (as per cppreference. Also C is often used in conjunction with POSIX system code where this is an explicit rule (see link in PR description).

I guess it's a form of Hungarian notation that does make sense, yes, but also leads to many false positives for C++.

Technically speaking the _t false positives like std::decay_t are types too (they're type aliases), but highlighting them as primitive types is not correct IMHO. There's quite an extensive set of trait helpers (i.e. many potential false positives), and accidentally making a special case out of every alias for these by using CPP_PRIMITIVE_TYPES is inconsistent with the wealth of other types in the language/library that are not treated as special.

krisvanrens avatar Jun 08 '21 22:06 krisvanrens

Yeah, this looks generally good to me.

For consistency I would consider adding clock_t, time_t, sig_atomic_t, and, potentially, jmp_buf, fpos_t, va_list, FILE, streamoff and streamsize. They’re all in the same category: typedefs to primitive types (or, in the case of va_list, fpos_t, and FILE, to an opaque struct and in the case of jmpbuf to an array).

Also, what about the atomic typedefs from C?

klmr avatar Jun 10 '21 10:06 klmr

For consistency I would consider adding clock_t, time_t, sig_atomic_t, and, potentially, jmp_buf, fpos_t, va_list, FILE, streamoff and streamsize. They’re all in the same category: typedefs to primitive types (or, in the case of va_list, fpos_t, and FILE, to an opaque struct and in the case of jmpbuf to an array).

Yes I guess that makes sense. Then in the meantime the C++ implementation has better C-highlighting than the C implementation 😏

Also, what about the atomic typedefs from C?

Yeah this makes sense as well. I will change the implementation to a regular expression like you suggested here before.

krisvanrens avatar Jun 10 '21 15:06 krisvanrens

Hey!

I'm sorry for leaving this PR open for so long. I haven't had much time to work on it.

Now I created an inventory of all the additional types that are considered special. I also added some extra test content to cover them. In the current implementation the additional types are in an array ADDITIONAL_TYPES that is merged with array RESERVED_TYPES to fill the type field of CPP_KEYWORDS. However, I'd now like to create a regular expression to replace ADDITIONAL_TYPES.

Can anyone help me to have this regular expression ADDITIONAL_TYPES_RE working alongside the array RESERVED_TYPES? I tried a couple of things but my knowledge of HL.js data structures is lacking..

Any help is appreciated! Thanks in advance 😄

krisvanrens avatar Jul 09 '21 12:07 krisvanrens

However, I'd now like to create a regular expression to replace ADDITIONAL_TYPES.

This isn't necessary, as compression will clean up much of the "wasted" space... and the readability aids future maintenance.

Technically speaking the _t false positives like std::decay_t are types too (they're type aliases), but highlighting them as primitive types is not correct IMHO.

I'm still hoping to get another pair of eyeballs on this as I still worry about killing the generic _t rules, even if it's not 100% "correct". The question is whether the highlighting overall does more harm than good (for the complexity), not whether it's 100% correct. And we've never defined harmful as "anything less than 100% correct". :-)

Then in the meantime the C++ implementation has better C-highlighting than the C implementation

When we're thru here if you're willing to circle back and fix C that's be great... but now that they have diverged such is the possibility, that one will become better than the other...

joshgoebel avatar Jul 09 '21 15:07 joshgoebel

However, I'd now like to create a regular expression to replace ADDITIONAL_TYPES.

This isn't necessary, as compression will clean up much of the "wasted" space... and the readability aids future maintenance.

True, true, but I'm not sure it has any impact on the speed for instance. With the types added that Konrad suggested, it's quite a list now (92 items with a fair bit of partly overlapped words).

Technically speaking the _t false positives like std::decay_t are types too (they're type aliases), but highlighting them as primitive types is not correct IMHO.

I'm still hoping to get another pair of eyeballs on this as I still worry about killing the generic _t rules, even if it's not 100% "correct". The question is whether the highlighting overall does more harm than good (for the complexity), not whether it's 100% correct. And we've never defined harmful as "anything less than 100% correct". :-)

Well yeah another vote on how to handle this is a good idea. I feel, as a drive-by contributor, I should only provide advice about this matter. Personally I lean towards the treating-additional-types-special approach, but it's a trade-off between simplicity + some false positives vs. complexity + (IMHO) more correct (not 100%) highlighting.

Then in the meantime the C++ implementation has better C-highlighting than the C implementation

When we're thru here if you're willing to circle back and fix C that's be great... but now that they have diverged such is the possibility, that one will become better than the other...

Yes, I would like to do this. But due to summer holidays etc. etc. I won't be able to pick this up until late August probably.

krisvanrens avatar Jul 09 '21 19:07 krisvanrens

True, true, but I'm not sure it has any impact on the speed for instance. With the types added that Konrad suggested, it's quite a list now (92 items with a fair bit of partly overlapped words).

92 is not a large list - we have languages with thousands of keywords. There is no run-time performance impact when using the keywords system - the lookups are done via pre-compiled object literal. There might be some impact if a MODE was used (with an either regex), but even then 92 is not a huge list and we typically still prefer readability over maximum theoretical performance.

IF speed became a higher priority I would prefer we somehow do this type of conversion at build or compile time - rather than hard code it into the source, making the source much harder to work with.

joshgoebel avatar Jul 09 '21 19:07 joshgoebel

OK, I pushed the update of the additional type list for now and the extension of the tests to verify all of these.

I hope someone else with a C++ background can chime in to discuss this PR!

krisvanrens avatar Jul 21 '21 07:07 krisvanrens