[Feature Request]: Reducing tensor storage overhead through token pooling for any ColBERT-like late interaction models

Open yingfeng opened this issue 1 year ago • 1 comments

No response

https://arxiv.org/abs/2409.14683

Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Token pooling and binary quantization are orthogonal

No response

No response

No response

Sep 24 '24 08:09 yingfeng

Another token pooling strategy: https://www.answer.ai/posts/colbert-pooling.html

Sep 25 '24 07:09 yingfeng