lucene
lucene copied to clipboard
Speed up ScalarQuantization by selecting quantiles together
Description
Currently in ScalarQuantizer
, ScalarQuantizer.fromVectorsAutoInterval()
will issue 4 calls (per to scratch-batch, basically len(vector)/20
) Selector.select()
and ScalarQuantizer.fromVectors()
will issue 2 calls. All of these 4/2 calls use the same vectors, just asking for different k
values. If we use a multi-select
algorithm, instead of separate select
algorithms, we can speed up these calls, especially ScalarQuantizer.fromVectorsAutoInterval()
which is repeating a lot of logic.
The size of the list to select from is practically 20*vector_dimensions
, so this greater speed ups will be observed with larger dimensionality. (Or if ScalarQuantizer.SCRATCH_SIZE
is ever increased)