lance
lance copied to clipboard
perf: PQ performance
Now we are optimizing the PQ performance, here to track all potential bottlenecks:
- [ ] 256bit/512bit SIMD for 4bit PQ
- [x] transforming: now we handle 4bit PQ case while transforming, which requires to collect the intermediate results
- [x] cache locality: the distance calculating can be optimized by better access pattern to distance table
- [x] constructing distance table: now it's 4x slower than computing distances
- [ ] find partitions: significant when
nprobesis small
Have we fixed either of these yet?
- [x] https://github.com/lancedb/lance/issues/2838
- [x] https://github.com/lancedb/lance/issues/2837
Have we fixed either of these yet?
not yet