Matt comments

Results 203 comments of


                                            Matt

Port Gemma to TF

In general, I find `pip install transformers[dev]` isn't really necessary! `pip install transformers[quality]` should be sufficient for most of what you need for a PR.

Hi @a8nova, when you say you're not implemeting caching, does that mean `past_key_values` just isn't implemented at all, or we're not implementing the `PyTorch` `StaticCache`? Not implementing `StaticCache` is totally...

Port Gemma to TF

Yeah - rather than implementing `StaticCache`, maybe we can just return tensors with variable shapes, like the other TF models do? You can probably copy the relevant code from another...

Moving in a folder & `push_to_hub` for a `trust_remote_code=True` model

Hm, this is challenging! I'm not sure how the model could autodetect which files should be pushed. I guess it would need to 1) Inspect the Python code being pushed...

Moving in a folder & `push_to_hub` for a `trust_remote_code=True` model

Some backstory: a lot of the dynamic module loading code was written by Sylvain Gugger, who has since left HF, and so no-one really owns it right now! I'm probably...

Moving in a folder & `push_to_hub` for a `trust_remote_code=True` model

Cool, I'll see if I can make it work! And don't stress about the extra work - we need to have someone take ownership of it again, so this is...

[Port] TensorFlow implementation of Mistral

Hi @ariG23498, the cause there is most likely that Numpy doesn't support `bfloat16` dtypes, and so the code fails because there is no direct conversion from Torch -> TF, it...

[Port] TensorFlow implementation of Mistral

PR is open at #29755

[Port] TensorFlow implementation of Mistral

PR merged! If you rebase, those loads should now work.

[Port] TensorFlow implementation of Mistral

@ariG23498 yes, errors like these almost always indicate that weights haven't been built! My guess is that since a lot of weights in each layer are missing, the problem is...