cotk
cotk copied to clipboard
Dataloader for HuggingFace gpt/gpt-2 and our Chinese gpt
Description: Added dataloader for Chinese-gpt implemented by pytorch-transformers.
Reference Issues: #XX (XX is the issue number you work on) Dataloader for huggingface transformers #1300 1, Added two classes -- HGFSingleTurnDialog, HGFCleanWB which only add formatted inputs for pytorch-transformers. The others are the same as BERTSingleTurnDialog, BERTOpenSubtitles. 2, The tokenizer is hard to changed to fit the model, maybe need a general base class for pytorch-transformers.