composable-sft icon indicating copy to clipboard operation
composable-sft copied to clipboard

[Q] LT-SFT and enc-dec models?

Open adamwawrzynski opened this issue 2 years ago • 3 comments

I'm wondering if this method should (theoretically) work with enc-dec models? Have You tried to train those models with code from this repository? I'm interested in utilizing this approach with T5 model.

adamwawrzynski avatar Jun 28 '22 11:06 adamwawrzynski

Hi Adam, I've done some experiments on BART with LT-SFT and I can confirm that it works, so I'm pretty sure T5 should work as well. I think you should be able to use LotteryTicketSparseFineTuner without modification, although the boilerplate code in the example scripts will likely require some adjustment for generative models. It's important to note that as with the BERT style models, you should generally decouple the input and output embedding matrices and freeze the output embeddings to achieve good performance.

AlanAnsell avatar Jun 28 '22 13:06 AlanAnsell

@AlanAnsell thank You for quick reply. Could You share scripts with BART experiments? It would be great starting point for further experimentation and adaptation for T5 architecture.

adamwawrzynski avatar Jun 28 '22 16:06 adamwawrzynski

Unfortunately I can't share those experiments with you right now, but I generally expect that adaptation shouldn't be too difficult, e.g. for BART I replaced DataCollatorForLanguageModeling with DataCollatorForDenoisingTasks I found here: https://github.com/morganmcg1/rotobart/blob/main/data_collator.py.

AlanAnsell avatar Jun 29 '22 13:06 AlanAnsell