haystack
haystack copied to clipboard
`OpenSearchDocumentStore` has wrong `embeddings_field_supports_similarity` if creating index
Describe the bug
If a query pipeline get's deployed before the index pipeline and has set create_index
to false, then the OpenSearchDocumentStore
doesn't set embeddings_field_supports_similarity
to True
.
The issue is the lack of a embeddings_field_supports_similarity=True
here in case the index doesn't exist.
Related issues:
- https://github.com/deepset-ai/haystack/issues/2802
- https://github.com/deepset-ai/haystack-hub-api/issues/1121
Error message Error that was thrown (if available)
Expected behavior
embeddings_field_supports_similarity
is set correctly even if the index didn't exist before.
Additional context Happed with a customer on Deepset Cloud.
To Reproduce
- Deploy a dense query pipeline with
create_index=True
- Deploy a index pipeline
- Dense query pipeline has
embeddings_field_supports_similarity = False
@wochinge if there is no index yet how should the code infer whether the embedding field supports the requested similarity type?
In your case you know that the embedding field will support it because you have some external context knowledge that there will definitely be an indexing pipeline that will create the index such that the embedding field supports the similarity function. Hence you would need to pass that additional piece of information explicitly.
So I'd suggest to simply set embeddings_field_supports_similarity
after the constructor call.
Adding an additional param for that wouldn't do anything else and so wouldn't be worth the overhead I guess. But maybe I'm missing something. WDYT?
Ah sorry, I missed it's not even set when creating the index. Will fix that, however supporting custom_mappings will be slightly more complicated. Probably it makes sense to separate the check from the index creation logic.
Ah sorry, I missed it's not even set when creating the index
Yes, that was my fault. Our initial theory was somewhat reversed. The issue is indeed that we only set embeddings_field_supports_similarity
when we're not creating an index.
Probably it makes sense to separate the check from the index creation logic.
Agree 👍🏻