cotk icon indicating copy to clipboard operation
cotk copied to clipboard

Dataloader for HuggingFace gpt/gpt-2 and our Chinese gpt

Open lemon234071 opened this issue 5 years ago • 0 comments

Description: Added dataloader for Chinese-gpt implemented by pytorch-transformers.

Reference Issues: #XX (XX is the issue number you work on) Dataloader for huggingface transformers #1300 1, Added two classes -- HGFSingleTurnDialog, HGFCleanWB which only add formatted inputs for pytorch-transformers. The others are the same as BERTSingleTurnDialog, BERTOpenSubtitles. 2, The tokenizer is hard to changed to fit the model, maybe need a general base class for pytorch-transformers.

lemon234071 avatar Nov 26 '19 08:11 lemon234071