Marc Sun comments

Results 59 comments of


                                            Marc Sun

memory bug in using accelerate with deepspeed to train diffusion models

cc @pacman100

"bfloat16.enabled" needed be specified when training T5

cc @pacman100

[Feature Request] Support FP8 mixed precision with FSDP Plugin

cc @pacman100

Multi GPU with custom device map and 4bit bnb quant

Hi @amrothemich, you need to pass your custom `device_map` when you load your model: ```python # Load 4-bit quantized model model = AutoModelForCausalLM.from_pretrained( modelpath, device_map=custom_device_map, quantization_config=bnb_config, torch_dtype=torch.bfloat16, ) ``` LMK...

Adding _tie_weights() to more models

Hi @hackyon, thanks for still working on this PR ! > Yea, I won't mind helping to add more no_split_modules if that is something you think is relatively high priority....

Adding _tie_weights() to more models

Hi @hackyon, I can indeed merge but the code needs to be validated from a core maintainer. Could you have a look @ArthurZucker ? Thanks again @hackyon for your patience.

assisted model offload

Check this [doc](https://huggingface.co/docs/accelerate/concept_guides/big_model_inference) from accelerate library. You can use big model inference directly by passing `device_map` in `from_pretained` if you are using transformers library !

Accelerate not working when setting subset of GPUs as visible CUDA devices

Hi @MrRobot2211, could you try to run it without setting `os.environ["CUDA_VISIBLE_DEVICES"] = "1,2"`. I want to check if this is the line that causes the issue. Thanks !

Accelerate not working when setting subset of GPUs as visible CUDA devices

Thanks ! I will investigate why it is happening. If you could share a minimal reproducer, that would help me a lot to fix this issue.

[`bnb`] Fix blip2 4bit

Hi @kevinknights29 , I see that in your script, you are trying to load in 8-bit and in 4-bit at the same time. Please select only one option. ```py return...