unilm
unilm copied to clipboard
About Pretraining on Diff Attention
Hello Team,
I would like to ask about your recommendation for the dataset used for pretraining the Diff Attention model.
Thank you.
Hi, our training corpus follow StableLM https://aka.ms/StableLM-3B-4E1T You can also use any datasets you like to train and compare Diff with baseline Transformer, the results should be similar.