lewton
lewton copied to clipboard
Match speed of libvorbis
atm we are two times slower than libvorbis. We need to be at least as fast as them.
Maybe there is some improvement doable in huffman tree decoding? No idea.
Note about current speed: it ranges between 1.6x and 1.8x for floor1 files, and its faster for floor0 files, but those don't really matter as there are almost no files with floor 0.
And part of the speed improvement was thanks to changes between rust 1.11 and 1.12 compilers.
Wow, seems recent changes in rustc have lead to some serious speed improvement. As of rust nightly compiler 2016-10-18
, lewton is only 1.18 to 1.25 as slow as libvorbis.
(note: I'm always comparing the "Overall ratio of difference" output of cargo run --release bench
of the cmp tool).
mhh, seems it has the same performance on Rust 1.12.1, so its caused by something else? No idea. Either way, its really good.
As of rustc 1.19.0-nightly (f4209651e 2017-05-05)
, the factor is around 1.09 to 1.12.
With rustc 1.21.0-nightly (2aeb5930f 2017-08-25)
, the factor is between 1.05 and 1.06.
Have there been some recent regressions?
I was curious so ran the comparison with rustc 1.30.0 (da5f414c2 2018-10-24)
and the latest master (0.9.3
):
$ cargo run --release bench
Finished release [optimized] target(s) in 0.58s
Running `target/release/cmp bench`
Comparing speed for bwv_1043_vivace.ogg : libvorbis=0.6495s we=0.8464s difference=1.30x
Comparing speed for bwv_543_fuge.ogg : libvorbis=0.9369s we=1.3493s difference=1.44x
Comparing speed for maple_leaf_rag.ogg : libvorbis=0.2593s we=0.3801s difference=1.47x
Comparing speed for hoelle_rache.ogg : libvorbis=0.4680s we=0.6724s difference=1.44x
Comparing speed for thingy-floor0.ogg : libvorbis=0.2157s we=0.2524s difference=1.17x
Overall time spent for decoding by libvorbis: 2.5293s
Overall time spent for decoding by us: 3.5007s
Overall ratio of difference: 1.38x
@ashthespy I'm not sure where this comes from. This slow behaviour happens on rustc 1.20 stable taken from rustup as well, so it isn't a regression of rustc itself or of llvm. It might be some improvement in how gcc optimizes: libvorbis is usually taken from the OS so it's compiled via your OS compiler, which is usually gcc, while lewton is compiled using rustc + llvm. To get a fair comparison, one would have to compare to clang of the same version that the rustc is coming from.
Most of the performance delta is due to the transient Vec and SmallVec allocations, realloc, and drops.
Here is a trace you can open with Instrument
on MacOS.
alloc trace.trace.zip
Please see my comments on the allocation issue regarding the need for an API and design change to solve this issue efficiently.