feature: support bm42 embeddings
What does this PR do?
This pr adds support for Qdrant/all_miniLM_L6_v2_with_attentions I followed https://github.com/Anush008/bm42-rs as a reference and added a new pooling method of bm42.
Tested with running with command
HF_TOKEN=hf_********************************** cargo run --no-default-features -F http -F ort -- --model-id Qdrant/all_miniLM_L6_v2_with_attentions --pooling bm42 --port 5888
Fixes #337
Before submitting
- [X] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. Issue #337
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
BM42 is super controversial. See https://x.com/Nils_Reimers/status/1809334134088622217 I'm on the side of waiting a bit to see how this evolve.
To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.
To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.
Fair. We really just want support for it so we could test on our own datasets and verify how good/bad it actually is. I agree a lot of that launch was shady; especially the quickwit bench portion.
Any feedback on the actual implementation? Because the router function async fn embed_sparse expects a dense vector then sparsifies it is making it super expensive to serve a lot of sparse embedding models. In the case of this model it uses a murmur3 hash which has a very large key Space.
Which means that Infer has to create and send a very large dense vector across a channel. https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/backends/ort/src/bm42.rs#L293-L296
https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/core/src/infer.rs#L193-L204
This will impact adding support for other sparse models as well as this model. It would be a lot better if Infer::embed_sparse either:
- Doesn't call the
embedfunction and have each Backend have anembed_sparsefunction themselves. - Have
Infer::embed_spasrsesparsify the vector before it gets sent back across the channel
I don't have enough context on what the best approach for this is, but it would open the opportunity for supporting other sparse embedding models that have larger keyspaces.
Which means that Infer has to create and send a very large dense vector across a channel.
Why is that a problem? Moving a Vec shouldn't be a concern, it's not expensive as far as I know.
But yes, having a dedicated embed_sparse function in the backend trait could be interesting.
Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled
Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled
Ok so when you say "Which means that Infer has to create and send a very large dense vector across a channel." you are only referring to the creation part? As I said the moving part ("send accross a channel") is not an issue.