image-png icon indicating copy to clipboard operation
image-png copied to clipboard

Very slow performances in WebAssembly (wasm)

Open mpizenberg opened this issue 5 years ago • 10 comments

Hi, I'm trying to port an image processing algorithm to the web with wasm-bindgen. The first step is to read a png image. I used this crate for that and it worked so thanks for that! I've set up a minimal working example to use png in wasm. The issue I have is that the decoding takes roughly 0.1s on my machine in wasm for a 640x480 png while being orders of magnitude faster on native code (cf perf screenshot below).

wasm-png

Any idea of what might be causing this issue?

mpizenberg avatar Apr 23 '19 14:04 mpizenberg

No, but looks like the leaf functions are dominantly in the inflate crate, not within this one. Instead of a screenshot, can you attach an archive of the actual perf data?

HeroicKatora avatar Apr 23 '19 14:04 HeroicKatora

Yes, here is the corresponding profile.

Profile-20190423T170034.zip

mpizenberg avatar Apr 23 '19 15:04 mpizenberg

I fear I have no idea how to interpret this. What do I need to be able to explore it just like in that image? Which tools did you use to profile? Etc.

HeroicKatora avatar Apr 23 '19 15:04 HeroicKatora

This is chromium dev tools that I used. If you unzip the file. You can load it in chromium/chrome dev tools in the performance tab, load profile button.

mpizenberg avatar Apr 23 '19 15:04 mpizenberg

Thanks, didn't know you can simply load one that way :)

So, the hard numbers important here are (everything in percent of the total time):

  • 85.9% spent in inflate::InflateStream::update
    • 85.4% of which are inflate::InflateStream::next_state
      • only 14.7% spent reading
  • 7.5% spent in memmove (this might be another possible point of optimization afterwards, with rather mediocre >2GB/s bandwidth this would still be much more than ~1MB of actual image data shuffled around).
  • 0.9% in png::filter::unfliter

So the runtime is indeed dominated by the time of inflate to which I would defer this issue (still within image-rs org).

HeroicKatora avatar Apr 23 '19 15:04 HeroicKatora

Ok I'll try to see if I can do some similar minimalist wasm example for the inflate crate to figure out what is going on there. Thanks for your time!

mpizenberg avatar Apr 23 '19 15:04 mpizenberg

@mpizenberg Maybe after an optimization in inflate, this still could seem slow. The volume of shuffled memory (through memmove and memcpy) is rather high in any case but that may be a symptom of the wasm sandbox, or of unecessary intermediate buffering. Hopefully this gets faster!

HeroicKatora avatar Apr 23 '19 15:04 HeroicKatora

I've been rewriting a PNG decoder because diving into the png crate was not the easiest. It doesn't cover all the spec but works for images without palette or interlaced data. It relies on miniz_oxide for the inflating code. The performance of the decoding code is great, especially for images with a majority of Sub scanline filters (like the "depth", "eye", "rgb" and "texture_alpha" images in table below). I've written down an approach comparison with the png crate in rust discourse forum in case interested. Below is a table summarizing decoding timings for images I used while writing the code.

Image this bis png crate OpenCV this (wasm) png (wasm)
depth.png 4.0 ms 3.6 ms 9.1 ms 4.0 ms 8.5 ms 30.7 ms
eye.png 0.48 ms 0.49 ms 0.96 ms 0.72 ms 1.5 ms 5.9 ms
inkscape.png 7.1 ms 7.4 ms 9.6 ms 6.6 ms 13.4 ms 30.2 ms
rgb.png 6.6 ms 6.6 ms 16.0 ms 6.5 ms 13.7 ms 52.1 ms
screen.png 6.5 ms 6.6 ms 10.2 ms 6.6 ms 11.6 ms 29.8 ms
texture_alpha.png 0.68 ms 0.68 ms 1.94 ms 0.99 ms 1.8 ms 8.0 ms
transparent.png 15.2 ms 15.3 ms 17.4 ms 13.2 ms 26.1 ms 55.8 ms

I hope this can also help improving performances in the png crate. The code base of this alternative decoder is very small for now (and not ready for beeing a crate yet) so don't hesitate to have a look if you're familiar with PNG decoding (this code is not very documented yet).

mpizenberg avatar Aug 12 '19 13:08 mpizenberg

@mpizenberg This is awesome, appreciate your hard work! Reopening this as tracking performance improvements since it both demonstrates possible improvements and contains a link to reference code. Switching the decoder (to miniz_oxide) may be part of those improvements (see also #151).

HeroicKatora avatar Aug 12 '19 16:08 HeroicKatora

It would be nice if someone could re-test the performance now that png crate has switched from inflate to miniz_oxide in v0.16.5

Speaking of Rust PNG implementations, there are also https://crates.io/crates/png_pong and https://crates.io/crates/imagine

Shnatsel avatar Jun 14 '20 18:06 Shnatsel