overreads
it's documented behavior so not strictly a bug, however you could avoid overreads by by computing the offsets from a table, there's only a few.
Ah, interesting idea. So if I understand correctly, there will still be four reads, but the overreads could be directed somewhere safe, making the interface a bit nicer.
I see two downsides:
-
Those reads are now each dependent on an offset table lookup, which depends on the length. Currently the reads can be performed even before the length is known.
-
In theory, on alignment-relaxed architectures like x86, the compiler could currently coalesce those four reads into a single 4-byte read. Using dynamic offsets would prevent this. However, neither GCC nor Clang currently seem to take this approach anyway.
there's no such thing as a read that is smaller than a cache line anyways, so consuming the first byte to calculate length then using a table (or some masking technique) would be fine, though it's hard to say how it'll perform in practice.