helix icon indicating copy to clipboard operation
helix copied to clipboard

fix: bitwise representation for RGB highlight

Open nik-rev opened this issue 8 months ago • 6 comments

Instead of using to_ne_bytes, we use to_be_bytes. ne means Native Endian, so it may different on different platforms -- causing the test to fail and the highlight color to be incorrect on some platforms.

We want to use Big Endian representation because we store the r, g, b in the last 3 bytes. this nicely maps onto big endian representation

thanks @Rudxain https://github.com/helix-editor/helix/pull/12308/files#r2010969852 for finding this. @RoloEdits https://github.com/helix-editor/helix/pull/12308/files#r2010998709 has a nice image to showcase this "endianness"

nik-rev avatar Mar 24 '25 22:03 nik-rev

\cc @pascalkuthe In response to https://github.com/helix-editor/helix/pull/12308#issuecomment-2719374178

Personally, I'd rather see Highlight defined as

enum Highlight {
  Index(u32),
  Color(u8, u8, u8),
}

That way it'll still fit into 8 bytes. It's a big regression on 32 bit architectures where this type will now be 8 bytes rather than 4, but packing 3 bytes of color into a 4 byte integer (which is what the code is doing right now) is insufficient since it's possible to have more than 255 items in the highlights array.

We'd need to double check but I think compiled code doing the enum matching should be about as equally efficient as using a u64 then manually doing the bit shifts?

This was actually tried before (turning Highlight into an enum) but we ended up not wanting to modify the type

See this code review comment: https://github.com/helix-editor/helix/pull/12308#discussion_r1979861585

\cc @the-mikedavis

nik-rev avatar Mar 25 '25 10:03 nik-rev

\cc @pascalkuthe In response to https://github.com/helix-editor/helix/pull/12308#issuecomment-2719374178

Personally, I'd rather see Highlight defined as

enum Highlight {
  Index(u32),
  Color(u8, u8, u8),
}

That way it'll still fit into 8 bytes. It's a big regression on 32 bit architectures where this type will now be 8 bytes rather than 4, but packing 3 bytes of color into a 4 byte integer (which is what the code is doing right now) is insufficient since it's possible to have more than 255 items in the highlights array.

We'd need to double check but I think compiled code doing the enum matching should be about as equally efficient as using a u64 then manually doing the bit shifts?

Also, on 32 bit architectures which have 4 bytes for a number: FF FF FF FF

We store the RGB color in the last 3 bytes (and only when the highlight is more than FF 00 00 00). like this: FF RR GG BB

Meaning the user can have up to FF 00 00 00 highlights in their theme. (4,278,190,080)

nik-rev avatar Mar 25 '25 10:03 nik-rev

I think tree-house has just a single u32 for Highlight (and would be replacing this Highlight) that can store as an index or as a color? I poked around a bit but only saw one way to get a value out, so would assume that while idx() exits, it still needs a rgb()? The fact that its a non-zero, non-max u32 makes things interesting, as I think its for reserving 2 bits for checking state? If so, then that would cover both cases (idx and color) with a single "u32"?

RoloEdits avatar Mar 25 '25 11:03 RoloEdits

I think tree-house has just a single u32 for Highlight (and would be replacing this Highlight) that can store as an index or as a color? I poked around a bit but only saw one way to get a value out, so would assume that while idx() exits, it still needs a rgb()? The fact that its a non-zero, non-max u32 makes things interesting, as I think its for reserving 2 bits for checking state? If so, then that would cover both cases (idx and color) with a single "u32"?

If tree-house uses Highlight(u30) (with 2 bits for internal state) then that does make things considerably simpler since we don't have to worry about usize being different in different architectures

And u30 can store 1 billion colors before we get to a point where theme scopes incorrectly get interpreted as RGB colors

nik-rev avatar Mar 25 '25 13:03 nik-rev

See https://github.com/helix-editor/tree-house/blob/1fa65eca36fdbb2837e0655bfda53ed627fc25c0/highlighter/src/highlighter.rs#L117-L140 - it's a non-max internally represented as a non-zero, so only the zero pattern is reserved. In order to use the trick of packing an RGB color into the highlight we'd need to subtract and add 1 before storing/accessing the underlying u32 to avoid u32::MAX, see https://github.com/the-mikedavis/helix/blob/7bc65338db11d768ac1b2fe65eef60ae02b707ad/helix-view/src/theme.rs#L389-L413.

An enum like that would take 8 bytes rather than 4 but we'd still have the null-pointer optimization: size_of::<Highlight>() == size_of::<Option<Highlight>>(). Since the color variant is only used for text annotations we could define this enum there and use the regular Highlight (4 bytes) for all other highlighting.

the-mikedavis avatar Mar 25 '25 13:03 the-mikedavis

This PR can be merged now, I updated it with the latest master. All it does now is switch from requiring the platform to be little endian (otherwise failing) to always using big endian, regardless of what the target platform supports

nik-rev avatar Jun 07 '25 10:06 nik-rev