donut
donut copied to clipboard
The article mentions the use of the first four layers of BART as the decoder, but why is Mbart's decoder used in the official code?