DiscoEval Clarification between your codes and the paper on NSP

Clarification between your codes and the paper on NSP

Open serenayj opened this issue 4 years ago • 3 comments

Hi,

Thank you for releasing your codes! I am trying to train an encoder with the sentence position task, and I noticed that in your paper, you have neighbor sentence prediction (NSP) as part of your training objectives. I wonder if you could point me to the NSP implementation in your codes.

Also, I know that DiscoEval is designed to be an evaluation toolkit for pre-trained encoder, but do you think it is possible to start training from scratch using the DiscoEval framework?

Feb 19 '21 20:02 serenayj

Hi, thanks for your interest. The NSP implementation is in this line: https://github.com/ZeweiChu/DiscoEval/blob/master/train/models.py#L213 where we take the encoded vector "sent_vec" and reconstruct two neighboring sentences "tgt" and "tgt2".

Regarding training from scratch on DiscoEval, we didn't experiment with randomly initialized models. I think it is possible to train from scratch on DiscoEval since for tasks like PDTB and RST-DT, there were people working on those tasks without using extra unlabeled data. For other datasets that were automatically constructed, I'm not sure but I think the models are unlikely to perform on par with the pretrained models.

Feb 19 '21 22:02 mingdachen

Thanks for your reply! I am particularly interested in the coherence tasks, such as Sentence Position. Just to confirm, is it because the training size is too small so you do not think the model training from scratch would outperform the pre-trained model?

Feb 20 '21 15:02 serenayj

Right, that's my speculation.

Feb 21 '21 05:02 mingdachen

DiscoEval DiscoEval copied to clipboard

Clarification between your codes and the paper on NSP

DiscoEval
DiscoEval copied to clipboard