haystack-core-integrations
haystack-core-integrations copied to clipboard
ElasticSearch Retriever is not performing well
Hello,
i'am using ElasticSearch as DocumentStore. So, i am using elastic search retrieval as follows
embedding_retriever:
init_parameters:
document_store:
embedding_similarity_function: l2_norm
init_parameters:
hosts: http://elasticsearch:9200
type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
num_candidates: 10
top_k: 10
type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
Although answer is out of the context, the retriever still return documents with high score. below is an example
{ "AnswerBuilder": { "answers": [ { "data": " The context provided does not contain information about Langchain.", "query": "WHat is langchain ?", "documents": [ { "id": "b0b39b5c34c63991019b566e34b1ccfb784cf96a461cebc3711611fd5d9b8b38", "content": "general-purpose speech toolkit. arXiv preprint\narXiv:2106.04624 .\nRebai, I., Benhamiche, S., Thompson, K., Sellami, Z.,\nLaine, D., and Lorr ´e, J.-P. (2020). Linto platform: A\nsmart open voice assistant for business environments.\nInProceedings of the 1st International Workshop on\nLanguage Technology Platforms , pages 89–95.\nRNNoise (2023). Github RNNoise. https://github.com/\nxiph/rnnoise.\nSpiller, T. R., Ben-Zion, Z., Korem, N., Harpaz-Rotem, I.,\nand Duek, O. (2023). Efficient and accurate transcrip-\ntion in mental health research-a tutorial on using whis-\nper ai for sound file transcription.Suznjevic, M. and Saldana, J. (2016). Delay limits for real-\ntime services. IETF draft .\nTrabelsi, A., Warichet, S., Aajaoun, Y ., and Soussilane, S.\n(2022). Evaluation of the efficiency of state-of-the-\nart speech recognition engines. Procedia Computer\nScience , 207:2242–2252.\nUnion, I. T. (2016). Mean opinion score interpretation and\nreporting. Standard, International Telecommunication\nUnion, Geneva, CH.\nValin, J.-M. (2018). A hybrid dsp/deep learning approach\nto real-time full-band speech enhancement. In 2018\nIEEE 20th international workshop on multimedia sig-\nnal processing (MMSP) , pages 1–5. IEEE.\nVaseghi, S. V . (2008). Advanced digital ", "dataframe": null, "blob": null, "meta": { "source": "default/ICAART24.pdf", "page": 7, "source_id": "74d29100e8daffb446d9d6e1c7185e096e3a51cf9332fc6c421cd9ca467648d6" }, "score": 0.67131597,
Best regards
Elastic search uses bm25 algorithm, why do think score of 0.67 is high?
@DemirTonchev i am using ES embedding Retriever. For query matchs with retrieved documents i have as well score between 0.60 and 0.82. So for me if the query does not match with retrieved documents, scores should be very small.
So for me if the query does not match with retrieved documents, scores should be very small.
Score of 0.6 - 0.82 is usually (in my experience) negligibly small. What is the length of your corpus and average idf? Looking at the query "WHat is langchain ?" and seeing the output document I would expect the score is small, there is no "langchain" in the returned text. How many documents are in the corpus that contain at least one occurrence of "langchain"? Also I suspect that " " (white space) is in your ES Doc store, which is not ideal.
@DemirTonchev in my documentstore i have just one document that talks about Vosk and Kaldi! There is no Occurance of langchain. I did this on purpose to see how the model behaves
When i ask a question about vosk, I have the good answer with score equals 0.67. Below is a screenshot
I remark that the score is between 0 and 1 .
So my conclusion is that when we ask a question out of context the retriever still return results with +- high score.
Can you please explain more the whitespace problem. I cannot got it.
Should be investigated.
- which embedding model are you using?
- have you tried with other
embedding_similarity_functions?