Yige Xu

Results 22 comments of Yige Xu

thank you for your issue we have shown some hyperparameters settings in our paper (see section 5.2) for bert checkpoints after further-pretraining, we share a link in our README (see...

Thank you for your issue! I have just uploaded our codes about the fine-tuning model on multi-tasks. The multi-task fine-tuning is just adding other softmax layers for other tasks. In...

For saving models, we did not save checkpoints during fine-tuning. If you need to save your models, we suggest using torch.save

sorry for a late answer 1. we also use a warm-up for layer-wise decreasing layer rate, which means, they are used simultaneously 2. we do not conduct experiments about learning...

hi thank you for your interest in our work. the config and the vocab file are the same as the original one, therefore our code does not automatically output the...

Thank you for your issue! 1. The number 2.6 was set for the beginning experiments, after that, we use run_classifier_discriminative.py for discriminative fine-tuning. 2. The link to run_classifier_discriminative.py is https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier_discriminative.py...

the further pre-trained task is masked language model, not language model, therefore using ppl i think may not be a good metric. can you set your batch size larger or...

We run on a single 1080Ti GPU for about 8-10 hours for 100k steps.

sorry for the late answer as shown above, 960M may have very limited memory. a GPU with 12G memory can only contain batch size=6 if max_seq_len=512. so please reduce your...

> Hi, > I tried run_pretraining.py recently, works fine to me. > I'm using tensorflow-gpu=1.15.0, cudatoolkit=10.0. > First, I think that 960M has very limited VRAM, that could cause your...