binary
binary copied to clipboard
perf: implement fast Get for integral types
This patch implements fast Get logic for integral types based on:
- Use a single load operation when loading with same endianness of the host, otherwise do a host load and a byteSwap. This avoids the overhead of multiple single-byte loads in the previous implementation.
- Use the unaligned Addr# load/store primops added since GHC 9.10 when available, otherwise do a plain peek. This ensures the GHC backends see the right AlignmentSpec at the Cmm level and can correctly emit unaligned load instructions.
There's no need for changing Put logic they're backed by FixedPrim
logic in Data.ByteString.Builder.Prim.Binary that already does
similar optimization.
Closes #215.