Stefano Fiorucci

Results 89 comments of Stefano Fiorucci

The problem seems clear: we are trying to store into SQL-based DocumentStores some unsuitable data, expressed in a nested format (as deeply analyzed in #2792). I see two possible solutions,...

While thinking about this issue, I discovered a problem with inconsistent output for `TransformersDocumentClassifier`: https://github.com/deepset-ai/haystack/issues/3167.

I add a useful resource to understand better this topic: [The Missing WHERE Clause in Vector Search](https://www.pinecone.io/learn/vector-search-filtering/) by @jamescalam.

👋 Please report the code, so we can have a reproducible example and help you.

Probably related to #1634.

@bogdankostic Your intuition points in the right direction. https://github.com/deepset-ai/haystack/blob/97a8d305129968dcbd5d483c849c1a778047f948/haystack/document_stores/elasticsearch.py#L225-L229 `document` is mapped as a [`flattened` field type](https://www.elastic.co/guide/en/elasticsearch/reference/7.9/flattened.html#flattened): > Given an object, the flattened mapping will parse out its leaf values...

### Related In `OpensearchDocumentStore` we use `nested` instead of `flattened`, since `flattened` is not supported (see also https://github.com/deepset-ai/haystack/pull/1609): https://github.com/deepset-ai/haystack/blob/21aedc644f701ea57463694c01fb980188ff0e45/haystack/document_stores/opensearch.py#L557-L564 **To solve the current bug, it may be reasonable to use...

Honestly, I saw the issue and simply submitted this PR. I only verified that with this PR the documents can be actually retrieved in batches. I haven't asked myself many...

I made some tests on my branch (you can find them on this [Colab notebook](https://colab.research.google.com/drive/1P4x8XG4Doxi1L5HFZuz5ANm-oO3HUB9T?usp=sharing)). I used ~ 17k short documents. I tested some `batch_size` values between 10000 (max) and...

After some usual git mess :smile:, I have added some details for the `batch_size` parameter in the docstrings.