kiwix-js icon indicating copy to clipboard operation
kiwix-js copied to clipboard

Extremely slow binary search in Chromium for Android

Open Jaifroid opened this issue 4 years ago • 12 comments

Although this is not an official target of Kiwix JS, I think this might uncover a problem in the coding of binary search. If you open https://kiwix.github.io/kiwix-js/www/index.html on an Android phone in Chrome or Edge-Chromium with a reasonably large ZIM (e.g. WikiMed) on a microSD card, the binary search is ridiculously slow and bugs out when backspacing in the search field (a race condition, I believe).

The system is generally pretty slow, and I believe it is because ASM is not optimized in the Android browser. However, slow binary search seems to be another issue because binary search does not decompress the title index (and therefore should not use xzdec), it merely accesses it on-disk repeatedly using FileReader(). The same ZIM accessed with the official Android Kiwix client does not have this issue.

I think this could be allieviated by reading more of the title index into memory (up to a predefined limit) in order to reduce storage/disk access, but I'm guessing wildly.

I note that it is claimed that WASM is supported on mobile Chrome, and therefore it might be worth experimenting with this to speed up access in mobile contexts generally, though it wouldn't make a difference to binary search.

NB the aim is not to challenge the official Kiwix Android app, but to optimize support for a PWA version of our app. An installable PWA of Kiwix JS Windows can be experimented with by going to https://kiwix.github.io/kiwix-js-windows/www/ in a browser that can install PWAs (Chrome or Edge-Chromium, on PC or Android). It has the limitation that the user must pick the ZIM each time s/he launches the app. However, this can be solved on Windows 10, because WinRT APIs can be accessed by a PWA if it is packaged for the Store or sideloaded (but not if it is installed from the Web).

Jaifroid avatar Jan 01 '20 11:01 Jaifroid

I confirm that issue on my device. It's very slow on Firefox Mobile too. On Firefox Preview, it looks much faster (except the first access, right after choosing the ZIM file, where the screen remains black for a very long time). It appears to be Android-specific. I suspect it might come from the way the file is "given" to the browser and/or how it is accessed by the browser itself. On Android, these browsers don't actually browse the filesystem by themselves. They ask for an application to provide the file(s), certainly via an "intent". I tested by choosing the File Browser application of LineageOS after clicking on the file input of our app.

mossroy avatar Jan 04 '20 12:01 mossroy

On Firefox Preview, it looks much faster (except the first access, right after choosing the ZIM file, where the screen remains black for a very long time).

Interesting. I tried that, and Firefox Preview is indeed much faster apart from bugging out on selecting the file (I actually had to restart the app a few times). I wonder if it is copying the ZIM to memory, which would then explain why file access is so fast.

I guess the solution to this is going to be a WASM version of libzim #514, if WASM has more direct file access. We could test that by compiling our current xzdec in WASM to see if it offers speed improvements (not in binary search, but in accessing articles).

Regarding binary search, I tested increasing the latency of the search entry field to 500ms. This reduces the number of searches launched for part of the result. But I'm wondering if we should set search-as-you-type as a user-selectable option rather than it being the default. This would be a quick and easy PR.

Jaifroid avatar Jan 05 '20 13:01 Jaifroid

It actually works even better on Firefox (non-preview) on Android. There is still the long wait after picking the ZIM archive for it to be available to the system, but then it really flies: both binary search and extracting files. It also offers to install Kiwix JS Windows as a full-screen app to the Home Screen (PWA), which works well in both jQuery and SW modes.

I'm still suspecting that it's copying the file to memory or a cache, however.

Jaifroid avatar Jan 05 '20 16:01 Jaifroid

I doubt a WASM version of xzdec would make a significant change on this issue. xzdec does not access any file, it runs directly on memory strings. This issue seems to be related to file access from browsers on Android platform.

mossroy avatar Jan 06 '20 20:01 mossroy

What I'm really trying to say is that Web Assembly (the language, not our version of xzdec) may be optimized in such a way as to allow fast/direct low-level access to a ZIM file (once picked), just as C++ or C# clearly is once the standard Android app has been given access to a folder or file. You're right that producing a WASM version of xzdec wouldn't be a good test of that.

Jaifroid avatar Jan 06 '20 22:01 Jaifroid

I think I can confirm that Firefox for Android attempts to copy the ZIM file to memory. I tested by attaching an OTG thumb drive to an Android phone, with several sizes of ZIM on it. Format is exFAT, and it is loaded by using the Paragon exFAT/NTFS tool. I used Kiwix JS Windows, installed as a PWA (Firefox recognizes and offers to install/pin it to the Home Screen, as do Chromium browsers).

Firefox can open ZIMs of under 2GB from this thumb drive, but is unable to open an 18GB ZIM or the full 70GB English Wikipedia ZIM. It simply fails to access the file, but also then the filepicker becomes unresponsive. I left it for several minutes, to no avail.

Using Chrome or Edge-Chromium PWA, the large files, including the full 70GB ZIM, can be opened from the attached drive, with the same symptoms as opening WikiMed, i.e. slow search and slow image extraction, though pages load "relatively" quickly.

Finally, the standard Kiwix app is apparently unable to open files from an attached OTG thumb drive. When tapping on the file in the File Manager, the Kiwix app opens, but says it cannot find the ZIM. There is no way to "pick" the file from within the app, since it only scans internal storage or an inserted microSD card (which must be FAT32, and therefore cannot hold ZIMs larger than 4GB)…

image

Jaifroid avatar Jan 07 '20 14:01 Jaifroid

Interesting. I suppose Chromium goes through some extra layers when we read the file in javascript, that explain the slowness. And you're probably right about how Firefox handles that. Unfortunately, I don't see an easy way to solve that on our side.

BTW Android does recognize microSD cards formatted as exFAT (at least not-too-old versions)

mossroy avatar Jan 08 '20 19:01 mossroy

I think this issue is the same as that reported here: https://stackoverflow.com/questions/48214601/readasarraybuffer-is-extremely-slow-on-mobile We have in common that we are using FileReader and readAsArrayBuffer in util.js readFileSlice(). If we could read more data per instance of FileReader, instead of opening a new one for very small slices, it could make quite a big saving in some types of read. I suspect Binary Search would be one of those.

Incidentally, my Android is very new, but cannot read exFAT formatted microSD cards, probably because it is Android One, uncustomized by the OEM. I'm stuck splitting ZIM files into 4GB chunks like it's 1990... 😀

Jaifroid avatar Jan 12 '20 15:01 Jaifroid

I've experimentally added @peter-x's least-recently-used cache implementation (https://github.com/kiwix/kiwix-js/pull/183/files) to the kiwix-js-windows environment found here: https://kiwix.github.io/kiwix-js-windows/ . Visiting that page in Chrome browser for Android allows testing this. (To test this, it may be necessary to reload the Service Worker's cached app files if the page has been visited before in the same browser: simply visit the page, wait a few seconds, exit the browser, and visit the page again.)

It produces a significant increase in binary search speed (and image load speed) on the latest WikiMed unsplit maxi file (wikipedia_en_medicine_maxi_2019-10.zim 1.5GB). Search is usable, compared to virtually unusable without the cache. This proves, I think, that the issue is that binary search opens hundreds of instances of FileReader for each keypress during the multiple binary search "hops". The cache cuts FileReader access during binary search by something like 90%, since it reads larger chunks of data and intercepts read requests to the file offset to return the request from memory instead of opening another FileReader. Once a FileReader is open, it reads data relatively quickly: the problem seems to be access.

I'm convinced there must be a more efficient way to search through the directory entries without having to resort to caching data reads.

Jaifroid avatar Jan 31 '20 01:01 Jaifroid

A further significant improvement in file access speeds on Android (and possibly other platforms) can be gained by testing for and using, if available, the new Blob.arrayBuffer() method of reading a file (instead of FileReader). This is supported in Chromium 79+ on Desktop and Android, and in Firefox Desktop (but not yet on Android). The method uses a native Promise instead of callbacks, so it avoids setting up multiple Q promises (potentially thousands for a binary search in a large index). Much of the extra speed may come from this (speculation).

The revised readFileSlice function would look like this (it only uses the new method if it is available):

    function readFileSlice(file, begin, size) {
        if ('arrayBuffer' in Blob.prototype) {
            return file.slice(begin, begin + size).arrayBuffer().then(function (buffer) {
                return new Uint8Array(buffer);
            });
        } else {
            return Q.Promise(function (resolve, reject) {
                var reader = new FileReader();
                reader.onload = function (e) {
                    resolve(new Uint8Array(e.target.result));
                };
                reader.onerror = reader.onabort = reject;
                reader.readAsArrayBuffer(file.slice(begin, begin + size));
            });
        }
    }

This can be tested on Android by visiting (in Chrome):

https://kiwix.github.io/kiwix-js-windows

If you have visited this page before, then you may need to update the Service Worker cache by completely exiting the browser once after visiting that page, and re-visiting the page. If Configuration -> About does not show 0.9.9.98 Beta-dev, then the new code is not yet being used.

Jaifroid avatar Mar 01 '20 12:03 Jaifroid

A more widely supported alternative to the first if block above is:

if ('Response' in window) {
    return new Response(file.slice(begin, begin + size)).arrayBuffer().then(function (buffer) {
        return new Uint8Array(buffer);
    });
}

This works in Chromium (desktop + mobile), Edge Legacy (+ mobile), Firefox Desktop (but not Firefox Android). See https://developer.mozilla.org/en-US/docs/Web/API/Body/arrayBuffer .

EDIT: Just from observation, this does not seem to be a particularly speedy way of reading a file. Maybe the Response function is not optimized for this. The previous Blob.arrayBuffer() method seems faster than the FileReader method (probably because of the native Promise), but Response().arrayBuffer() does not seem to offer any advantage over a Q-wrapped FileReader.

Jaifroid avatar Mar 02 '20 10:03 Jaifroid

Fun fact I discovered today. On Andoid, the "Samsung Internet" browser (available more widely than Samsung phones, I believe) has significantly faster read speeds with Kiwix JS (Windows) than Chrome for Android. It makes accessing the full English Wikipedia quite usable, at least on a reasonably well-specified phone. Tested on the PWA version: pwa.kiwix.org. It also allows installation of the PWA in the list of apps. Still requires picking the ZIM on each restart, though (no Android browsers support the File System Access API yet).

Jaifroid avatar May 31 '21 20:05 Jaifroid