Sylvain Gugger comments

Results 631 comments of


                                            Sylvain Gugger

FP8 training causes OOM

@overvalidated It is FP8 **mixed** precision training. The actual memory usage will be higher than in regular training since you ahve two copies of the model. One in FP8 and...

FP8 training causes OOM

@overvalidated I have no code I can reproduce so I can't really explain what goes wrong for you.

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

What is your version of Accelerate? Also note that `decapoda-research/llama-7b-hf` is not usable at all as they converted the model in the middle of the PR adding Llama and is...

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

@philip30 This is because the initalization under `init_empty_weights` breaks the tied weights. You need to add a `model.tie_weights()` to re-tie them afterward.: ```py whisper_model = "openai/whisper-tiny" weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')...

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

@huide9 Like for everyone else there is nothing we can do without the code reproducing the error.

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

cc @younesbelkada since this is using 8bit-loading

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

I don't think it is possible to use `load_in4bit` without at least `low_cpu_mem_usage=True` (and normally you need `device_map="auto"`). In any case this is not because the error message is the...

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

If you are not offloading anything (e.g. the device map only contains GPUs), it works for training as well.

Model Parallelism and accelerate's usage of DDP aren't compatible

Yes, Accelerate does not support DDP with model parallelism. I'm not sure your proposed fix would work as DDP will all-reduce the gradients across GPUs except all GPUs don't have...

FSDP unable to load checkpoint, state dict, saved weights

cc @pacman100