libzim icon indicating copy to clipboard operation
libzim copied to clipboard

Speed-up Xapian searches by preloading indexes

Open kelson42 opened this issue 4 years ago • 4 comments

#418 has shown that the typical steps for a search are:

  1. Read the zim file (to be able to locate the xapian index in it) : Cold : 7.44s | Warm : 0.12s
  2. Open the xapian database (internal xapian code) : Cold : 0.09s | Warm : 0.003s
  3. Set the enquire on the database : Cold : 0.02s | Warm: 0.0004s
  4. Run the enquire and get a set of (ranged) results from the enquire (internal xapian code) : Cold : 3.74s | Warm : 1.5s

Here is when it happens:

  1. Once at file opening
  2. At first search requested and then cached
  3. At each search
  4. At each search

In a attempt to speed-up searches (in particular the first one) the idea would be to have the following workflow:

  1. Once at file opening
  2. Once at file opening (optional?) and then cached
  3. At file opening and then keep one (more?) ready to go all the time
  4. At each search

He would be the related questions on my side:

  • Can we secure that 2. does not slows down the opening of the file (so it should run in an other thread)?
  • Can we secure that 3. does not slows down the searches (so it should run in an other thread)?
  • I guess the whole search system is protected to avoid two search requests to happen at the same time. If this is secure in a multithreaded context. This will be responsible of massive slow downs in many search requests happen at the same time. Would that be possible/reasonable to have a pull of "searcher"?

kelson42 avatar Aug 29 '21 08:08 kelson42

@mgautierfr @maneeshpm What do you think? Is that a proper approach?

kelson42 avatar Aug 29 '21 08:08 kelson42

Any update here?

kelson42 avatar Sep 27 '21 05:09 kelson42

@mgautierfr @maneeshpm We have started the dev of 7.2.0. Do we agree on this approach?

kelson42 avatar Jan 02 '22 22:01 kelson42

Just to add to the documented issue here, Xapian-based search in the WASM version of libzim is basically unusable on Android, due to excessive I/O generated by libzim on startup. See https://github.com/openzim/javascript-libzim/issues/42.

Jaifroid avatar Jul 30 '23 05:07 Jaifroid