Richard Diamond
Richard Diamond
I've pushed an update. It now includes build script code to generate kernels for `mc` functions: `put_8tap`, `prep_8tap`, and `mc_avg`. The `put_8tap` and `prep_8tap` kernels have different versions for each...
Partitioning the `mc` kernels resulted in a massive speed up compiling `librav1e`: for a white space change: `~165s`, of which `~120s` is spent item checking (parallel compiler will help with...
Ah, I forgot I changed the import for the nasm stuff in the build script too.
@lu-zero I have a slightly old comparison, which doesn't include the recent SAD kernel rewrite: https://docs.google.com/spreadsheets/d/1oduiszODkHUk2FQG7-Hflb9KWiTw3CQIVnhto2d2Ldo/edit?usp=sharing Averages ~203% base speed. Interestingly, `-s 9` sees ~266% across all tested `-q` levels....
New bench results: now averages ~2.28x faster; `-s 9 -q *` is 2.84x faster. https://docs.google.com/spreadsheets/d/1oduiszODkHUk2FQG7-Hflb9KWiTw3CQIVnhto2d2Ldo/edit?usp=sharing I think `-s 9` is broken though... VLC can play, but it's pretty garbled. Base...
Also, I've murdered compile times again...
Fixed compile times for dev builds for the most part; release builds are still a PITA.
@EwoutH Rebased, but asm support is broken ATM.
@vibhoothiiaanand Hi. I actually started a rebase a while ago now, but I kinda lost interest midway through. Supporting yet another set of kernels, without unified function table statics/dispatch for...