Sylvain Gugger comments

Results 633 comments of


                                            Sylvain Gugger

1cycle Policy. Unfamiliar results

Sorry, I didn't this until now. The blog post you're referring too is a bit old now, and it was when we were just grasping with super-convergence. Now we can...

Adds CLIP to models exportable with ONNX

This is a flaky test, don't worry. Thanks again for your contribution!

TF Examples Rewrite

That might because your new job did not run ;-) You need to add at the end [here](https://github.com/huggingface/transformers/blob/d7e2d7b40b1070cddfe878e13705725f49a2cf1f/.circleci/config.yml#L1000) for the one at each commit and [there](https://github.com/huggingface/transformers/blob/d7e2d7b40b1070cddfe878e13705725f49a2cf1f/.circleci/config.yml#L1024) for the nigthly one...

[WIP] german docs translation

I think it's preferable to use the more formal option, but in most cases, I'd prefer to reformulate the sentences to use the first person plural (wir) unless the sentence...

[WIP] german docs translation

Let me know when you're done and thanks a lot for diving into German translation! (Sorry should have begun with that!)

`bitsandbytes` - `Linear8bitLt` integration into `transformers` models

Merged the PR in Accelerate! Don't forget to add some documentation and also setup some tests for this so it doesn't get broken by future PRs :-)

[TENTATIVe] Attempt to reduce number of HEAD calls during model warmup.

The PR is #18534, but nothing will beat using offline mode with the model cached, since you are then doing 0 calls to the API.

[TENTATIVe] Attempt to reduce number of HEAD calls during model warmup.

If the revision is specified as a commit sha, then yes, the cache should be used. This is not implemented by #18534 however, but could be some follow up work....

[TENTATIVe] Attempt to reduce number of HEAD calls during model warmup.

Yes there is probably some way to cache that the file does not exist for a given commit sha. Pinged a few people internally to see if they like that...

Test summary with previous PyTorch/TensorFlow versions

Ideally it should mean that all models work/all tests pass apart from functionality explicitly having versions tests (like CUDA bfloat16 or torch FX where we test against a specific PyTorch...