infinity
infinity copied to clipboard
[Feature Request]: Reducing tensor storage overhead through token pooling for any ColBERT-like late interaction models
Is there an existing issue for the same feature request?
- [X] I have checked the existing issues.
Is your feature request related to a problem?
No response
Describe the feature you'd like
https://arxiv.org/abs/2409.14683
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
Token pooling and binary quantization are orthogonal
Describe implementation you've considered
No response
Documentation, adoption, use case
No response
Additional information
No response
Another token pooling strategy: https://www.answer.ai/posts/colbert-pooling.html