Parcollet Titouan
Parcollet Titouan
@mravanelli should have a look
Thanks @Adel-Moumen, is this copy-pasted from the main SB repo? (just to know if we will get some merge conflicts in the future).
@shucongzhang do you remember making a change for your CTC only recipe? The change looks good to me though.
Hello there, we would need much more information about what the model/trainer/data/task is to give you an answer. SummaryMixing does not, in itself, induce more instability during training than MHSA....
Hi, we need much more information to help you here I am afraid. This could be due to many reasons that are all most likely not connected to SummaryMixing. Please...
Hello, I've had a quick look at your code, but I am way too unfamiliar with this codebase to make any meaningful comment. My only comment would be that we...
It took us a long time due to many high priority things, and mostly because it does not seem to affect the results, but this is an important fix. Maybe...
Hey Peter, I would think that using peft could be a nice add to this PR. Not critical, but a nice add! I will have a look at the code,...
@pplantinga are the checkpointing features working as well with this easy peft adaptation? We should make sure it works with Pretrainer also, not just checkpointing I blieve.
@Adel-Moumen @mravanelli I think we will want this in v1.0.1 And it looks ready to me?