streamvbyte
streamvbyte copied to clipboard
Port differential coded version to ARM NEON
The generic codec supports both x64 and ARM NEON, however the differential-encoded version is x64 only.
It seems like it would be easy to port them over. The Delta function in ARM is almost identical:
uint32x4_t Delta(uint32x4_t curr, uint32x4_t prev) {
return vsubq_u32(curr, vextq_u32 (prev,curr,3));
}
And so is the prefix sum which is currently mixed with the store in _write_avx_d1 (for historical reasons I suppose)...
uint32x4_t PrefixSum(uint32x4_t curr, uint32x4_t prev) {
uint32x4_t zero = {0, 0, 0, 0};
uint32x4_t add = vextq_u32 (zero, curr, 3);
uint8x16_t BroadcastLast = {12,13,14,15,12,13,14,15,12,13,14,15,12,13,14,15};
prev = vreinterpretq_u32_u8(vqtbl1q_u8(vreinterpretq_u8_u32(prev),BroadcastLast));
curr = vaddq_u32(curr,add);
add = vextq_u32 (zero, curr, 2);
curr = vaddq_u32(curr,prev);
curr = vaddq_u32(curr,add);
return curr;
}
It could be that my implementations are suboptimal, but I think that they are correct and given these functions it should be easy to create a differentially coded codec.