Sylvain Gugger
Sylvain Gugger
While I understand the idea of grouping related arguments together, the proposed approach is very functional, which is not something we use anywhere in the Transformers library. So this API...
Hi @apohllo Sorry for the delay on this. Would something like in the PR linked above work for you?
Could you please share a snippet of code that fails on such an env with `device_map="auto"` sent to `from_pretrained`? This loads the model directly on the GPU (as long as...
I think you are missing a `torch_dtype=torch.float16` or `torch_dtype=torch.bfloat16` to get to 12GB of use. Otherwise the model will need 24GB of memory if it has 6b parameters (the default...
Can you try to see if adding a layer of garbage collector helps? ```py import gc gc.collect() ``` There is no reason for the CPU RAM to be used once...
Mmm, diving into the reproducer @muellerzr, it looks like memory is not released by PyTorch when moving the model to a device: ``` import psutil, torch from transformers import AutoModelForCausalLM...
Please make sure to run `make style` on your branch so that the quality tests pass. cc @gante for review.
Please have each of your PR focused on one thing. We don't want to group changes that are not linked to each other in the same PR :-)
It does look like the model code is exactly the same at a first glance (saw everything is copied from ConvNext). If that is the case, yes to re-using the...
The main commit message is the title of the PR.