Benjamin Bossan

Results 795 comments of Benjamin Bossan

> I like the renaming and docstring suggestion. > I can open a PR for the same. That would be great, thanks. > I'm running it on a macbook with...

Hmm, this is super strange. I double checked, my graphs look identical with and without sleep (tested both CUDA and CPU). Maybe this is an MPS-specific issue? Perhaps we should...

Interesting. If anyone else could try on their machine, so that we can collect more data on this issue, it would be great. Anyway, for now I guess the best...

> do you know when will be the next release of PEFT ? I'd like to present this feature during the KDD conference on 28th August There is no concrete...

As a general remark: Gemma is unusual in that it uses a relatively big vocabulary. Therefore, the embedding layer is particularly large. This is especially noticeable with the smaller Gemma...

For the question whether the embedding is saved as part of the checkpoint, setting `ensure_weight_tying` makes no difference. Note that if you have `modules_to_save=["embed_tokens"]`, it is required to save the...

> The code I wrote here should enable all PEFT checkpoints that are linked to models in the VLMs list (explicit mapping dictionary and remapping is specified) to function. I'm...

Thanks for the pointer. I ran my own tests to check all directions, using transformers `v4.49.0` as the old state, which should be from before the mentioned PR, and transformers...

> I'm guessing you could take the same list (called VLMs), and apply the mapping from the base model in the same way automatically ? Yes, this is true, I...

I created a PR to address this in PEFT: https://github.com/huggingface/peft/pull/2574. I used fake mini models to mimic the old and new model architectures for testing. The tests are still failing...