About full mask

Open Martine307 opened this issue 4 years ago • 2 comments

If the full mask equals to 1, the decoder can be seen as bert, right?

Dec 25 '21 08:12 Martine307

From the view of model structure, I think yes. But there are differences between the training paradigm between Bert and Bart decoder (Bert has next sentence prediction and masked token completion tasks for pre-training, while bart decoder uses cross-attention from bart encoder and is pre-trained autoregressively).

Apr 15 '22 14:04 ImKeTT

Is there any lexical constraints set such as T5 prefix for training? Is it correct that the keyword is included in the training data? You don't specify a keyword prefix like {c1, c2, c3, c4} ?

Nov 12 '22 06:11 tldtldcpfl