composable-sft
composable-sft copied to clipboard
[Q] LT-SFT and enc-dec models?
I'm wondering if this method should (theoretically) work with enc-dec models? Have You tried to train those models with code from this repository? I'm interested in utilizing this approach with T5 model.
Hi Adam, I've done some experiments on BART with LT-SFT and I can confirm that it works, so I'm pretty sure T5 should work as well. I think you should be able to use LotteryTicketSparseFineTuner without modification, although the boilerplate code in the example scripts will likely require some adjustment for generative models. It's important to note that as with the BERT style models, you should generally decouple the input and output embedding matrices and freeze the output embeddings to achieve good performance.
@AlanAnsell thank You for quick reply. Could You share scripts with BART experiments? It would be great starting point for further experimentation and adaptation for T5 architecture.
Unfortunately I can't share those experiments with you right now, but I generally expect that adaptation shouldn't be too difficult, e.g. for BART I replaced DataCollatorForLanguageModeling with DataCollatorForDenoisingTasks I found here: https://github.com/morganmcg1/rotobart/blob/main/data_collator.py.