Port differential coded version to ARM NEON

Open lemire opened this issue 8 years ago • 0 comments

The generic codec supports both x64 and ARM NEON, however the differential-encoded version is x64 only.

It seems like it would be easy to port them over. The Delta function in ARM is almost identical:

uint32x4_t Delta(uint32x4_t curr, uint32x4_t prev) {
   return vsubq_u32(curr, vextq_u32 (prev,curr,3));
}

And so is the prefix sum which is currently mixed with the store in _write_avx_d1 (for historical reasons I suppose)...

uint32x4_t PrefixSum(uint32x4_t curr, uint32x4_t prev) {
   uint32x4_t zero = {0, 0, 0, 0};
   uint32x4_t add = vextq_u32 (zero, curr, 3);
   uint8x16_t BroadcastLast = {12,13,14,15,12,13,14,15,12,13,14,15,12,13,14,15};
   prev = vreinterpretq_u32_u8(vqtbl1q_u8(vreinterpretq_u8_u32(prev),BroadcastLast));
   curr = vaddq_u32(curr,add);
   add = vextq_u32 (zero, curr, 2);
   curr = vaddq_u32(curr,prev);
   curr = vaddq_u32(curr,add);
   return curr;
}

It could be that my implementations are suboptimal, but I think that they are correct and given these functions it should be easy to create a differentially coded codec.

Dec 16 '17 04:12 lemire