rav1e
rav1e copied to clipboard
Use generics for bit depth throughout the encoder
This is a followup from #3116 which expands this optimization to as many places in the encoder as we can reasonably utilize it. By using generics, there are places where the compiler is able to simplify math operations at compile time as well as areas where the compiler is able to remove branches so that we only branch on bit depth at the highest level of the code (and therefore the fewest number of times).
Based on hyperfine benchmarking, this results in a 1-2% speedup across the encoding process, although it does increase the final binary size.
Looks like arm needs some additional changes.
Could you mention the the amount of size increase?
On a fully stripped binary on x64 linux, before was 3.8mb and after was 5.1mb
Let's try to do that in steps and measure the increase.
I have considered creating an internal enum for bit-depth, with access to the related values when needed. This would allow us to at least restrict some of the branching and hint about the value range,