haystack icon indicating copy to clipboard operation
haystack copied to clipboard

TransformersSimilarityRanker (transformers_similarity.py) runtime error

Open kristapsdz-saic opened this issue 1 year ago • 2 comments

Describe the bug The ranker fails regardless of its input.

Error message The following code is used to trigger the output, although any exemplar invocation will do the trick:

from haystack.components.rankers import TransformersSimilarityRanker
ranker = TransformersSimilarityRanker(model="sentence-transformers/all-MiniLM-L6-v2")
ranker.warm_up()

# retriever_output["documents"] contains a list of Document types
# question is a string with the query question

ranker.run(query=question, documents=retriever_output["documents"])

When executed:

Traceback (most recent call last):
  File "xxxxxx", line 130, in <module>
    ranked_output = ranker.run(query=question, documents=retriever_output["documents"])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxx/lib/python3.12/site-packages/haystack/components/rankers/transformers_similarity.py", line 268, in run
    documents[i].score = similarity_scores[i]
                         ~~~~~~~~~~~~~~~~~^^^
TypeError: list indices must be integers or slices, not list

When examined, the i value in this code is assigned to a list, with sorted_indices, from which i is assigned, being a list of lists.

Expected behavior That i in the file would be a scalar.

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce

Run an invocation of the ranker with the following Pipfile:

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
haystack-ai = "*"
sentence-transformers = ">=2.2.0"
pypdf = "*"
mdit-plain = "*"
llama-cpp-python = "==0.2.56"
llama-cpp-haystack = "*"
accelerate = "*"

[dev-packages]

[requires]
python_version = "3.12"

FAQ Check

System:

  • OS: Mac OS X Ventura 13.6.1
  • GPU/CPU: CPU
  • Haystack version (commit or version number): current
  • DocumentStore: InMemoryDocumentStore (but does not matter)
  • Reader: none
  • Retriever: InMemoryEmbeddingRetriever (but does not matter)

kristapsdz-saic avatar Apr 09 '24 17:04 kristapsdz-saic

Hi @kristapsdz-saic , I encountered the same issue, which seems to be caused by the model output structure. The model produces a 2D array as output. I attempted to use the Reranker model(refer below), and it worked for me without any errors. However, further debugging is required to resolve the issue completely.

ranker = TransformersSimilarityRanker(model="BAAI/bge-reranker-large")

nvenkat94 avatar Apr 17 '24 11:04 nvenkat94

Hey @kristapsdz-saic , @nvenkat94 is correct, this component only supports models with a Cross-Encoder architecture (which is the same as SequenceClassification in HuggingFace terms). Typically, models with reranker or cross-encoder in their name use this architecture and will be supported by this component.

The model provided in your original example "sentence-transformers/all-MiniLM-L6-v2" is an embedding model (or Bi-Encoder), which is not supported by this component. There is a nice explanation of the difference between Bi-Encoders and Cross-Encoders from Sentence Transformers here.

sjrl avatar Jun 05 '24 04:06 sjrl