lance perf: PQ performance

perf: PQ performance

Open BubbleCal opened this issue 1 year ago • 2 comments

Now we are optimizing the PQ performance, here to track all potential bottlenecks:

[ ] 256bit/512bit SIMD for 4bit PQ
[x] transforming: now we handle 4bit PQ case while transforming, which requires to collect the intermediate results
[x] cache locality: the distance calculating can be optimized by better access pattern to distance table
[x] constructing distance table: now it's 4x slower than computing distances
[ ] find partitions: significant when nprobes is small

Nov 21 '24 01:11 BubbleCal

Have we fixed either of these yet?

Nov 21 '24 01:11 wjones127

Have we fixed either of these yet?

[x] perf: presize KeepFiniteVectors transform #2838

[x] perf: PQ transform re-allocates a lot #2837

not yet

Nov 21 '24 01:11 BubbleCal