Sylvain Gugger
Sylvain Gugger
@overvalidated It is FP8 **mixed** precision training. The actual memory usage will be higher than in regular training since you ahve two copies of the model. One in FP8 and...
@overvalidated I have no code I can reproduce so I can't really explain what goes wrong for you.
What is your version of Accelerate? Also note that `decapoda-research/llama-7b-hf` is not usable at all as they converted the model in the middle of the PR adding Llama and is...
@philip30 This is because the initalization under `init_empty_weights` breaks the tied weights. You need to add a `model.tie_weights()` to re-tie them afterward.: ```py whisper_model = "openai/whisper-tiny" weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')...
@huide9 Like for everyone else there is nothing we can do without the code reproducing the error.
cc @younesbelkada since this is using 8bit-loading
I don't think it is possible to use `load_in4bit` without at least `low_cpu_mem_usage=True` (and normally you need `device_map="auto"`). In any case this is not because the error message is the...
If you are not offloading anything (e.g. the device map only contains GPUs), it works for training as well.
Yes, Accelerate does not support DDP with model parallelism. I'm not sure your proposed fix would work as DDP will all-reduce the gradients across GPUs except all GPUs don't have...