haystack icon indicating copy to clipboard operation
haystack copied to clipboard

`OpenSearchDocumentStore` has wrong `embeddings_field_supports_similarity` if creating index

Open wochinge opened this issue 2 years ago • 3 comments

Describe the bug If a query pipeline get's deployed before the index pipeline and has set create_index to false, then the OpenSearchDocumentStore doesn't set embeddings_field_supports_similarity to True. The issue is the lack of a embeddings_field_supports_similarity=True here in case the index doesn't exist.

Related issues:

  • https://github.com/deepset-ai/haystack/issues/2802
  • https://github.com/deepset-ai/haystack-hub-api/issues/1121

Error message Error that was thrown (if available)

Expected behavior embeddings_field_supports_similarity is set correctly even if the index didn't exist before.

Additional context Happed with a customer on Deepset Cloud.

To Reproduce

  1. Deploy a dense query pipeline with create_index=True
  2. Deploy a index pipeline
  3. Dense query pipeline has embeddings_field_supports_similarity = False

wochinge avatar Jul 14 '22 15:07 wochinge

@wochinge if there is no index yet how should the code infer whether the embedding field supports the requested similarity type? In your case you know that the embedding field will support it because you have some external context knowledge that there will definitely be an indexing pipeline that will create the index such that the embedding field supports the similarity function. Hence you would need to pass that additional piece of information explicitly. So I'd suggest to simply set embeddings_field_supports_similarity after the constructor call. Adding an additional param for that wouldn't do anything else and so wouldn't be worth the overhead I guess. But maybe I'm missing something. WDYT?

tstadel avatar Jul 15 '22 11:07 tstadel

Ah sorry, I missed it's not even set when creating the index. Will fix that, however supporting custom_mappings will be slightly more complicated. Probably it makes sense to separate the check from the index creation logic.

tstadel avatar Jul 15 '22 11:07 tstadel

Ah sorry, I missed it's not even set when creating the index

Yes, that was my fault. Our initial theory was somewhat reversed. The issue is indeed that we only set embeddings_field_supports_similarity when we're not creating an index.

Probably it makes sense to separate the check from the index creation logic.

Agree 👍🏻

wochinge avatar Jul 15 '22 11:07 wochinge