lewton Match speed of libvorbis

atm we are two times slower than libvorbis. We need to be at least as fast as them.

Sep 01 '16 09:09 est31

Maybe there is some improvement doable in huffman tree decoding? No idea.

Sep 01 '16 15:09 est31

Note about current speed: it ranges between 1.6x and 1.8x for floor1 files, and its faster for floor0 files, but those don't really matter as there are almost no files with floor 0.

Oct 03 '16 00:10 est31

And part of the speed improvement was thanks to changes between rust 1.11 and 1.12 compilers.

Oct 03 '16 00:10 est31

Wow, seems recent changes in rustc have lead to some serious speed improvement. As of rust nightly compiler 2016-10-18, lewton is only 1.18 to 1.25 as slow as libvorbis.

Oct 21 '16 15:10 est31

(note: I'm always comparing the "Overall ratio of difference" output of cargo run --release bench of the cmp tool).

Oct 21 '16 15:10 est31

mhh, seems it has the same performance on Rust 1.12.1, so its caused by something else? No idea. Either way, its really good.

Oct 21 '16 20:10 est31

As of rustc 1.19.0-nightly (f4209651e 2017-05-05), the factor is around 1.09 to 1.12.

May 07 '17 06:05 est31

With rustc 1.21.0-nightly (2aeb5930f 2017-08-25), the factor is between 1.05 and 1.06.

Aug 26 '17 14:08 est31

Have there been some recent regressions? I was curious so ran the comparison with rustc 1.30.0 (da5f414c2 2018-10-24) and the latest master (0.9.3):

$ cargo run --release bench
    Finished release [optimized] target(s) in 0.58s
     Running `target/release/cmp bench`

Comparing speed for bwv_1043_vivace.ogg : libvorbis=0.6495s we=0.8464s difference=1.30x
Comparing speed for bwv_543_fuge.ogg    : libvorbis=0.9369s we=1.3493s difference=1.44x
Comparing speed for maple_leaf_rag.ogg  : libvorbis=0.2593s we=0.3801s difference=1.47x
Comparing speed for hoelle_rache.ogg    : libvorbis=0.4680s we=0.6724s difference=1.44x
Comparing speed for thingy-floor0.ogg   : libvorbis=0.2157s we=0.2524s difference=1.17x

Overall time spent for decoding by libvorbis: 2.5293s
Overall time spent for decoding by us: 3.5007s
Overall ratio of difference: 1.38x

Nov 01 '18 14:11 ashthespy

@ashthespy I'm not sure where this comes from. This slow behaviour happens on rustc 1.20 stable taken from rustup as well, so it isn't a regression of rustc itself or of llvm. It might be some improvement in how gcc optimizes: libvorbis is usually taken from the OS so it's compiled via your OS compiler, which is usually gcc, while lewton is compiled using rustc + llvm. To get a fair comparison, one would have to compare to clang of the same version that the rustc is coming from.

Nov 01 '18 17:11 est31

Most of the performance delta is due to the transient Vec and SmallVec allocations, realloc, and drops. Here is a trace you can open with Instrument on MacOS. alloc trace.trace.zip

Please see my comments on the allocation issue regarding the need for an API and design change to solve this issue efficiently.

Jan 24 '19 12:01 fdoyon

lewton lewton copied to clipboard

Match speed of libvorbis

lewton
lewton copied to clipboard