Arthur
Arthur
On it 🤗
Replaces (#16875)
Okay, the `1b lyrics` and `5b lyrics` match the original code. Just need refactoring to have better variable names and wrap the sampling kwargs for easy use.
Hi, Awesome work thank you very much! I am also wondering if you have any plans of releasing the Effective Receptive Field visualization code? There's pretty much none out there......
Hey, before diving a bit deeper, sorry for the long delay, and thanks for the PR. Would you mind adding a test? 🤗 I can take care of it otherwise!
I am having a look RN, will tell you when I know more 👍🏻
Hey @miguelwon it seems that you are right about the training not converging at all using current version. However, since loading a trained model in the new versions does not...
Hey! Little update on this : the problem comes from the previously introduced "hack" : ```python return tf.Variable(emb, trainable=False, name="model.embed_positions.weights") ``` This appears [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/xglm/modeling_tf_xglm.py#L86). This hack can also be seen...
Should still be deterministic from my intuition, let me have a look
Just a small comment, in terms of performances I think the decorator can be a little bit improve to only run on the model's forward and not on every single...