image-png
image-png copied to clipboard
Very slow performances in WebAssembly (wasm)
Hi, I'm trying to port an image processing algorithm to the web with wasm-bindgen. The first step is to read a png image. I used this crate for that and it worked so thanks for that! I've set up a minimal working example to use png
in wasm. The issue I have is that the decoding takes roughly 0.1s on my machine in wasm for a 640x480 png while being orders of magnitude faster on native code (cf perf screenshot below).
Any idea of what might be causing this issue?
No, but looks like the leaf functions are dominantly in the inflate
crate, not within this one. Instead of a screenshot, can you attach an archive of the actual perf
data?
I fear I have no idea how to interpret this. What do I need to be able to explore it just like in that image? Which tools did you use to profile? Etc.
This is chromium dev tools that I used. If you unzip the file. You can load it in chromium/chrome dev tools in the performance tab, load profile button.
Thanks, didn't know you can simply load one that way :)
So, the hard numbers important here are (everything in percent of the total time):
-
85.9%
spent ininflate::InflateStream::update
-
85.4%
of which areinflate::InflateStream::next_state
- only
14.7%
spent reading
- only
-
-
7.5%
spent inmemmove
(this might be another possible point of optimization afterwards, with rather mediocre>2GB/s
bandwidth this would still be much more than ~1MB of actual image data shuffled around). -
0.9%
inpng::filter::unfliter
So the runtime is indeed dominated by the time of inflate
to which I would defer this issue (still within image-rs
org).
Ok I'll try to see if I can do some similar minimalist wasm example for the inflate crate to figure out what is going on there. Thanks for your time!
@mpizenberg Maybe after an optimization in inflate
, this still could seem slow. The volume of shuffled memory (through memmove
and memcpy
) is rather high in any case but that may be a symptom of the wasm sandbox, or of unecessary intermediate buffering. Hopefully this gets faster!
I've been rewriting a PNG decoder because diving into the png crate was not the easiest. It doesn't cover all the spec but works for images without palette or interlaced data. It relies on miniz_oxide for the inflating code. The performance of the decoding code is great, especially for images with a majority of Sub
scanline filters (like the "depth", "eye", "rgb" and "texture_alpha" images in table below). I've written down an approach comparison with the png crate in rust discourse forum in case interested. Below is a table summarizing decoding timings for images I used while writing the code.
Image | this | bis | png crate | OpenCV | this (wasm) | png (wasm) |
---|---|---|---|---|---|---|
depth.png | 4.0 ms | 3.6 ms | 9.1 ms | 4.0 ms | 8.5 ms | 30.7 ms |
eye.png | 0.48 ms | 0.49 ms | 0.96 ms | 0.72 ms | 1.5 ms | 5.9 ms |
inkscape.png | 7.1 ms | 7.4 ms | 9.6 ms | 6.6 ms | 13.4 ms | 30.2 ms |
rgb.png | 6.6 ms | 6.6 ms | 16.0 ms | 6.5 ms | 13.7 ms | 52.1 ms |
screen.png | 6.5 ms | 6.6 ms | 10.2 ms | 6.6 ms | 11.6 ms | 29.8 ms |
texture_alpha.png | 0.68 ms | 0.68 ms | 1.94 ms | 0.99 ms | 1.8 ms | 8.0 ms |
transparent.png | 15.2 ms | 15.3 ms | 17.4 ms | 13.2 ms | 26.1 ms | 55.8 ms |
I hope this can also help improving performances in the png crate. The code base of this alternative decoder is very small for now (and not ready for beeing a crate yet) so don't hesitate to have a look if you're familiar with PNG decoding (this code is not very documented yet).
@mpizenberg This is awesome, appreciate your hard work! Reopening this as tracking performance improvements since it both demonstrates possible improvements and contains a link to reference code. Switching the decoder (to miniz_oxide
) may be part of those improvements (see also #151).
It would be nice if someone could re-test the performance now that png
crate has switched from inflate
to miniz_oxide
in v0.16.5
Speaking of Rust PNG implementations, there are also https://crates.io/crates/png_pong and https://crates.io/crates/imagine