transformers
transformers copied to clipboard
BlenderbotSmall incorrect usage of start and end tokens
System Info
transformersversion: 4.27.2- Platform: Windows-10-10.0.19041-SP0
- Python version: 3.8.3
- Huggingface_hub version: 0.12.0
- PyTorch version (GPU?): 1.13.0+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no
Who can help?
@ArthurZucker @younesbelkada @Narsil
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
As stated in the documentation: https://huggingface.co/docs/transformers/model_doc/blenderbot-small#transformers.BlenderbotSmallForConditionalGeneration.forward.example
the model should use </s> and <s> for separating the user input and response:
from transformers import AutoTokenizer, BlenderbotSmallForConditionalGeneration
mname = "facebook/blenderbot_small-90M"
model = BlenderbotSmallForConditionalGeneration.from_pretrained(mname)
tokenizer = AutoTokenizer.from_pretrained(mname)
UTTERANCE = "My friends are cool but they eat too many carbs."
print("Human: ", UTTERANCE)
inputs = tokenizer([UTTERANCE], return_tensors="pt")
reply_ids = model.generate(**inputs)
print("Bot: ", tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0])
REPLY = "I'm not sure"
print("Human: ", REPLY)
NEXT_UTTERANCE = (
"My friends are cool but they eat too many carbs.</s> <s>what kind of carbs do they eat? "
"i don't know much about carbs</s> "
"<s> I'm not sure."
)
inputs = tokenizer([NEXT_UTTERANCE], return_tensors="pt")
next_reply_ids = model.generate(**inputs)
print("Bot: ", tokenizer.batch_decode(next_reply_ids, skip_special_tokens=True)[0])
However, these tokens are not present in the vocabulary or special tokens
I assume they should be replaced with __start__ and __end__?
I have also tried to use the ConversationPipeline, and follow steps outlined here, but I always get nonsensical results.
Even when trying the hosted inference API for the model (https://huggingface.co/facebook/blenderbot_small-90M), it either repeats itself, or doesn't follow in conversation.
Expected behavior
The tokens should be correct, and the chatbot should engage in more realistic conversation