haystack
haystack copied to clipboard
Adding support for additional distance/similarity metrics for Weaviate
Related Issues
- fixes #3000
Proposed Changes:
Now since v14.x Weaviate supports additional distance/similarity metrics outside of the original cosine.
See https://weaviate.io/developers/weaviate/current/vector-index-plugins/distances.html
This PR is to extend the Haystack Weaviate distance metrics support to l2-squared and dot too - in line with what other document stores support, like Elasticsearch.
How did you test it?
I wrote additional unit tests.
Notes for the reviewer
Please note that Weaviate handles distance metrics differently from Elasticsearch. You can only set the distance metric to use at the time of index creation and NOT at query-time! (see this being discussed with the Weaviate devs at https://weaviate.slack.com/archives/C017EG2SL3H/p1659959441677119)
For this reason I had to add logic which verifies that when the user initiates the WeaviateDocumentStore with a certain similarity metric, then that metric matches the one that index already has in Weaviate. If not, then it will throw an error to the user recommending the recreation of the index or changing of the similarity metric to match the one the index has already been setup with.
Please also note the difference between certainty and distance. Weaviate returns certainty only for the cosine similarity. For the other ones, only distance is available. See https://weaviate.io/developers/weaviate/current/vector-index-plugins/distances.html#distance-vs-certainty
Please note that the Haystack tutorial mentions as a disadvantage of the Weaviate datastore its lack of dot product similarity support (see Choosing the Right Document Store at https://haystack.deepset.ai/components/document-store).
I guess once this PR is merged, that will no longer we valid.
Also the BM25 support is in experimental mode in Weaviate and already supported by Haystack, so I guess that can stay for now (due to the experimental mode).
Checklist
- [ x ] I have read the contributors guidelines and the code of conduct
- [ x ] I have updated the related issue with new insights and changes
- [ x ] I added tests that demonstrate the correct behavior of the change
- [ x ] I've used the conventional commit convention for my PR title
- [ x ] I documented my code
- [ x ] I ran pre-commit hooks and fixed any issue
Thanks @agnieszka-m for all the language corrections. I have added it all. If I have missed anything, feel free to add those directly to the files. Thanks!