Todd Morrill
Todd Morrill
I fixed the typo you pointed out (thanks!) and noticed a couple other things along the way. It turns out that I was running out of GPU memory because of...
Thanks for all the feedback so far. The example is coming together. I implemented the `collate_fn` that we discussed above and things are now working well in Skorch (significantly faster...
@stsievert, my apologies for the delay in picking this back up. The good news, we’re up and running! After working with `Skorch` a bit more, it became quite obvious to...
> * Why does model convergence depend on "pad[ding] at the batch level) vs. padding to the longest example in the dataset"? Is "convergence" in terms of optimization iterations? It's...
Sure, it's Todd Morrill.
My fix was the following in `model.py`. ``` # attn.bias isn't in the hugging face state dict, so we can't check for it assert len(keys) == len([k for k in...
I've got the same issue. Has anyone been able to pin down the root cause of this? I had no issues with saving/reloading PEFT models for the 7b chat model...
Thanks so much @newsbreakDuadua9, this was a huge help, and clearly a nice solution that makes use of the FSDP saving facilities. I don't mind the CPU usage for now...