transfer-learning-conv-ai icon indicating copy to clipboard operation
transfer-learning-conv-ai copied to clipboard

add_prefix_space missing from tokenization

Open ddemszky opened this issue 5 years ago • 0 comments

In your docs, you mention that whenever the GPT2 tokenizer is used, there should be a space prefixed to the input string: https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.PreTrainedTokenizer

However, in the get_dataset() function in utils.py, you do not add add_prefix_space=True to the tokenizer call. Why is this the case?

Thanks for this great toolkit!

ddemszky avatar Mar 21 '20 23:03 ddemszky