FFmpeg
FFmpeg copied to clipboard
experiment: separate out chroma deblock and start unrolling
Extremely rough draft for implementing AVX2 support.
Separated out the chroma and luma loop just for experimentation purposes.
Unroll loop for (int x = x0 ? x0 : grid; x < x_end; x += 2 * grid) and compute two blocks per loop.
I think I'll need to change the function signature to take a parameter specifying the number of blocks to process at a time.
OR if there's always an even number of blocks to process, just always assume there's two blocks available.
void (*filter_chroma[2 /* h, v */])(uint8_t *pix, ptrdiff_t stride, const int32_t *beta, const int32_t *tc,
const uint8_t *no_p, const uint8_t *no_q, const uint8_t *max_len_p, const uint8_t *max_len_q, int shift);