Sylvain Gugger comments

Results 633 comments of


                                            Sylvain Gugger

Accelerate a non-HF model, like detectron2

The PR linked above should fix the two last issues if you want to give it a try.

Use torch.TensorDicts: The output of tokenizers.batch_encode_plus/call could be made to inherit from torch TensorDicts

The result of the tokenizer calls can already interact with the `to` method (note that batch_encode_plus will be deprecated sometime soon) but I agree it could be interesting to look...

device_map='auto' gives bad results

Mmmm there is no reason for the script to give different results for different GPUs, especially since removing the device_map="auto" gives the same results. I also can't reproduce on my...

OSError: Token is required (`token=True`), but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens.

You need to make sure to execute the cell `notebook_login()` at the beginning and pass it your token (it provides a direct link to your token pages on hf.co)

[In progress] Add warning padding attention mask

cc @gante

Trainer failing during _save_checkpoint "cannot pickle '_thread.lock' object" with skip_memory_metrics=True

Your code example doesn't define multiple objects, so I can't really tell what's wrong. Please give us a minimal reproducer we can execute.

Trainer failing during _save_checkpoint "cannot pickle '_thread.lock' object" with skip_memory_metrics=True

Could you also print us `trainer.state`? The error comes from the fact it is not JSON-serializable so it would help to know which object in it is not serializable. Thanks!

Trainer failing during _save_checkpoint "cannot pickle '_thread.lock' object" with skip_memory_metrics=True

So your metrics are not floats, but one ends up being a whole scikit-learn module, this is why you have the issue. The code you pasted is actually super weird:...

Accelerate support for GLM

You just need to add the proper attribute to `GLMPreTrainedModel` so that it knows which layers should not be split across GPUs and then test it works properly. Since this...

Accelerate support for GLM

Does it work without the load_in_8bit part? Also what is your version of Accelerate?