kiwix-js icon indicating copy to clipboard operation
kiwix-js copied to clipboard

Memory leak when using libzim wasm

Open mossroy opened this issue 2 years ago • 1 comments

If you watch the memory consumption of the browser (tested with Chromium) on branch https://github.com/kiwix/kiwix-js/tree/libzim-experiments, you see that it gradually raises when browsing.

It's more obvious when reading large contents, like ZIM files that have videos (I tested with dirtybiology ones)

After opening a few videos, it crashes with the following error message in the log:

Cannot enlarge memory, asked to go up to 2213478400 bytes, but the limit is 2147483648 bytes!

So I suspect that the ZIM contents are not deleted from memory after they are used, probably in the C bindings

mossroy avatar Jun 06 '22 15:06 mossroy

I haven't looked closely at the glue you implemented, but when I did the ZSTD decoding glue, I ran into a similar issue because I thought I had to malloc and de-allocate memory between each call to the Emscripten VM. It turned out that de-allocation never entirely got rid of the memory structures, and so each new allocation made the memory grow until it crashed.

The solution I arrived at is summarized in the comment here -- essentially I assigned the control structures and memory space once, and then just re-used them for every operation.

Of course this was for a single decoder, not for a whole application like libzim. So the experience here may not be relevant.

Jaifroid avatar Jun 06 '22 15:06 Jaifroid

Here a few comments about how other Kiwix readers deal with videos:

  • Videos are not compressed within the ZIM because this is not necessary. Like for pictures, the native compression algorithm, webp for example, is good enough.
  • Considering that videos can be large files, their decompression - if they were compressed - would take anyway a large amount of CPU/Time and memory. This would be a problem in term of usability (at least). This would probably not be possible to do that at all on low-end devices.
  • For the same reason, its not recommended to ask the libzim to read the cluster to exctract videos and transmit them as a big blob to the Kiwix reader.

The recommend way is to request the information from the libzim, where exactly the video starts and how long it is. Then the reader should open itself a file/video handle on that portion of data and read it directly.

kelson42 avatar Nov 13 '22 15:11 kelson42

The recommend way is to request the information from the libzim, where exactly the video starts and how long it is. Then the reader should open itself a file/video handle on that portion of data and read it directly.

That's interesting, so we need custom support in the reader for videos. We currently do this of course, but we would need to do it with the correct libzim API. I imagine there could be some difficulty with split ZIM archives, or rather, the reader would need to handle the offsets taking into account the split.

Jaifroid avatar Nov 13 '22 15:11 Jaifroid

The recommend way is to request the information from the libzim, where exactly the video starts and how long it is. Then the reader should open itself a file/video handle on that portion of data and read it directly.

That's interesting, so we need custom support in the reader for videos. We currently do this of course, but we would need to do it with the correct libzim API. I imagine there could be some difficulty with split ZIM archives, or rather, the reader would need to handle the offsets taking into account the split.

There is an API for that, but can not tell which one exactly, but browsing through the API doc should help to answer that question. There is no problem with split ZIM files, because this should never happen in the middle of a cluster (this is exactly for this use case that we don't allow random cut position anymore).

kelson42 avatar Nov 13 '22 16:11 kelson42

From what I remember, this memory leak issue was not specifically related to videos. The fact that videos take a lot of space simply made the problem more obvious. I suspect the problem is in our C/emscripten glue code to call the libzim.

Regarding videos, I had opened #869 to extract them in small chunks. It's probably enough, and I currently don't see how to do something more in SW mode.

mossroy avatar Nov 13 '22 17:11 mossroy

Maybe this ticket should be transferred in https://github.com/openzim/javascript-libzim/ Because it's probable that it's where the problem is (but not 100% sure)

mossroy avatar Nov 19 '22 11:11 mossroy

I will try to do so without scrambling everything...

kelson42 avatar Nov 19 '22 11:11 kelson42

I couldn't work out how to transfer this issue to a different namespace (openzim). So instead, I posted https://github.com/openzim/javascript-libzim/issues/34 with a link back to this discussion.

Jaifroid avatar Dec 11 '22 21:12 Jaifroid