lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Speed up ScalarQuantization by selecting quantiles together

Open HoustonPutman opened this issue 4 months ago • 0 comments

Description

Currently in ScalarQuantizer, ScalarQuantizer.fromVectorsAutoInterval() will issue 4 calls (per to scratch-batch, basically len(vector)/20) Selector.select() and ScalarQuantizer.fromVectors() will issue 2 calls. All of these 4/2 calls use the same vectors, just asking for different k values. If we use a multi-select algorithm, instead of separate select algorithms, we can speed up these calls, especially ScalarQuantizer.fromVectorsAutoInterval() which is repeating a lot of logic.

The size of the list to select from is practically 20*vector_dimensions, so this greater speed ups will be observed with larger dimensionality. (Or if ScalarQuantizer.SCRATCH_SIZE is ever increased)

HoustonPutman avatar Oct 15 '24 19:10 HoustonPutman