volk
volk copied to clipboard
Add 4ic deinterleave to 8i x2
I think this can be done pretty efficiently, if bytewise arithmetic shift operators exist. Otherwise its probably just loop unrolling? The volk_8ic_deinterleave_16i_x2
kernels are much more complicated than I expected though so I'm probably not aware of a lot of nuances of available SIMD operations.
uint8_t input[size];
uint8_t out_1[size];
uint8_t out_2[size];
for (int i = 0; i < size; i++) {
out_1[i] = input[i] << 4;
out_1[i] = out_1[i] >> 4;
out_2[i] = input[i] >> 4;
}
So let's see, you propose a new kernel volk_4ic_deinterleave_8i_x2
?
Do you have a use case? I have an idea how to use such low resolution values. But I'd suggest a LUT instead of shifts.
Are you willing to implement a first kernel?
@jdemel Yes, I'm writing blocks for a Radio Astronomy acquisition board which stream packed signed 4bit IQ data. Yes. I'll put up a PR shortly with nearly complete generic and SSE2 kernels, though I have some uncertainty about dispatchers and input datatypes as there isn't a native 4bit type in C++.