transfer-learning-conv-ai
transfer-learning-conv-ai copied to clipboard
add_prefix_space missing from tokenization
In your docs, you mention that whenever the GPT2 tokenizer is used, there should be a space prefixed to the input string: https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.PreTrainedTokenizer
However, in the get_dataset() function in utils.py, you do not add add_prefix_space=True to the tokenizer call. Why is this the case?
Thanks for this great toolkit!