haystack
haystack copied to clipboard
Unable to Provide Max Seq Len to Query Classifier
Describe the bug
I get the following warning when classifying sentences as statements or queries using TransformersQueryClassifier: Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation..
Error message
This will result in a shape error (model expects texts with <= 512 tokens) if passing sequences longer than the base model's max seq len to model.run(query=sent).
Expected behavior
Ideally model.run would handle truncating the text. Alternatively, a user could specify the max seq len when calling .run.
Additional context Passing a list of sentences represented as strings. Some are fairly long and need to be truncated.
To Reproduce
!pip install -q farm-haystack==1.8.0
from haystack.pipeline import TransformersQueryClassifier
clf = TransformersQueryClassifier(model_name_or_path='shahrukhx01/question-vs-statement-classifier')
def extract_questions_with_bert_mini(sent):
return clf.run(query=sent)[1]
Run on sentences with very long texts.
FAQ Check
- [X] Have you had a look at our new FAQ page?
System:
- OS: Mac
- GPU/CPU: CPU
- Haystack version (commit or version number): 1.8.0
- DocumentStore: N/A
- Reader: N/A
- Retriever: N/A