haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Unable to Provide Max Seq Len to Query Classifier

Open jstremme opened this issue 3 years ago • 0 comments

Describe the bug I get the following warning when classifying sentences as statements or queries using TransformersQueryClassifier: Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation..

Error message This will result in a shape error (model expects texts with <= 512 tokens) if passing sequences longer than the base model's max seq len to model.run(query=sent).

Expected behavior Ideally model.run would handle truncating the text. Alternatively, a user could specify the max seq len when calling .run.

Additional context Passing a list of sentences represented as strings. Some are fairly long and need to be truncated.

To Reproduce

!pip install -q farm-haystack==1.8.0
from haystack.pipeline import TransformersQueryClassifier

clf = TransformersQueryClassifier(model_name_or_path='shahrukhx01/question-vs-statement-classifier')

def extract_questions_with_bert_mini(sent):
    
    return clf.run(query=sent)[1]

Run on sentences with very long texts.

FAQ Check

System:

  • OS: Mac
  • GPU/CPU: CPU
  • Haystack version (commit or version number): 1.8.0
  • DocumentStore: N/A
  • Reader: N/A
  • Retriever: N/A

jstremme avatar Sep 07 '22 00:09 jstremme