inflate
inflate copied to clipboard
Big slow-down in WebAssembly (wasm)
Hi, I'm coming from a discussion in https://github.com/image-rs/image-png/issues/114. My issue is regarding the very slow reading of png images in wasm. @HeroicKatora identified that the issue might come from inflate calls. So I've tried to set up a very simple example to verify performance drops.
For this I'm using the file at https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz. I decoded then re-encoded it with libflate, otherwise the original encoding was not decodable by inflate. It is a 296Mb file when decoded, 88Mb encoded. In the example code, there is a main.rs in which I roughly measure native decoding performances. And in the lib.rs file, there are wasm exposed functions, enabling me to copy-paste different versions of the inflating and trying them in the browser.
So I've tried each version available from inflate in native, and 3 versions in wasm. Here are the results.
version | native speed | wasm speed |
---|---|---|
inflate_bytes | 2.8s | 20.6s |
InflateStream | 2.8s | not tested |
InflateWriter | 2.9s | not tested |
DeflateDecoder | 0.64s | 4.6s |
DeflateDecoderBuf | 3.0s | not tested |
--------- | -------------- | ------------ |
libflate | 2.2s | 4.8s |
As we can see, inflate is one order of magnitude slower in wasm than in native. Libflate however is "only" 2x slower in a wasm context. In addition, we can see here that using DeflateDecoder
is a lot faster than using InflateStream
, which is the one used in the png crate (there are probably reasons for this that I'm not aware of).
I'm not familiar enough with DEFLATE to try to understand what might be the reason for this slow down in the code but I wanted to report the issue. I hope you might have an idea of what is wrong, and probably a fix to enjoy inflate in wasm ^^.
I believe the main reason is that DeflateDecoder was added to the API much more recently than InflateStream.
InflateStream (and DeflateDecoderBuf which uses it) have a 32k buffer, DeflateDecoder doesn't so maybe there is some difference there. Not sure what libflate has by default.
If you curious about speed, you may also want to compare flate2 with the rust back-end enabled.
I've opened a pull request for the benchmark harness for inflate
that I made ages ago because right now inflate
has no benchmarking facilities at all: #56 Compared to a year ago or so inflate
performance has regressed by 33% according to these benchmarks.
According to the investigation in https://github.com/image-rs/image-png/issues/114, 85% of time is spent in inflate::InflateStream::next_state
, so that's where you should look if you want to fix this.
@mpizenberg for this to be actually tackled I suggest providing a step-by-step guide to reproducing the setup that exhibits the slowdown. If I were a library maintainer and never dealt with wasm before, I wouldn't bother unless it was very clear what to do.
@Shnatsel ok, I'll update the example repo with exact instructions to reproduce behavior in coming days.
I have updated the associated repository (https://github.com/mpizenberg/wasm-inflate) with instructions in the readme to reproduce this benchmark. Versions have changed since last April. On my computer, with rustc 1.36.0, inflate 0.4.5, libflate 0.1.25, wasm-bindgen 0.2.47, I have the following results with native rust compilation:
Elapsed (inflate_bytes): Ok(2.85300905s)
Elapsed (DeflateDecoder): Ok(3.021417847s)
Elapsed (DeflateDecoderBuf): Ok(2.992962828s)
Elapsed (InflateStream): Ok(2.565384845s)
Elapsed (InflateWriter): Ok(2.905209633s)
Elapsed (libflate): Ok(2.312502443s)
And in wasm with Firefox 67.0.4 I got:
inflate_bytes: 9991 milliseconds.
deflate_decoder: 9434 milliseconds.
deflate_decoder_buf: 9258 milliseconds.
inflate_stream: 7964 milliseconds.
inflate_writer: 8848 milliseconds.
libflate: 4902 milliseconds.
Two things are already noticeably different from last time in April.
- inflate_bytes in wasm is twice as fast as before.
- DeflateDecoder is much slower here and now roughly at same speed than other methods.
Unfortunately I don't think I'll have time to investigate this further for quite some time, but at least its better documented now.