Alex Klibisz
Alex Klibisz
If you have any stack traces from the elasticsearch node, that would also be very helpful.
Hi @mikemccand, thanks for the reply. As a side note, I've found many of your articles very helpful! > Hmm, why is `Self time` so high in your profiler output?...
Some more notes-to-self for when I get back to this: Here are the VisualVM hotspots from running the SIFT benchmark (1M stored vectors, 10k total queries) on a local ES...
Thanks again for digging into this a bit. > The countHits method looks fine, though you should not iterate to docs.cost() Good to know, I'll fix that. > You might...
Maybe there's a clever stopping criteria to visiting all of the terms? I started reading about MaxScore and WAND scoring. Maybe that's dead end here?
Put together a working early-stopping solution today. Roughly the idea is: - Compute the total number of term hits up front by iterating over the terms enum once. - Iterate...
> That sounds promising! Do you take the docFreq (or maybe totalTermFreq) of terms into account? E.g., collecting all term + docFreq from the TermsEnum, then sort them in increasing...
I'm afraid the early-stopping method as I described it isn't going to work. Specifically, it's pretty easy to find a case where a single vector matches for multiple consecutive hash...
> OK, sigh :) > > I still think you should explore indexing "sets of commonly co-occurring hashes" if possible. If your query-time vectors are such that the same sets...
I'm trying to make precise what advantage indexing co-ocurring hashes would have. If we assume that Lucene's retrieval speed is maxed out, the next obvious speedup is to somehow decrease...