adrienchaton
adrienchaton
Hello Zheng and thanks for your reply. That sounds very good and indeed I would appreciate if you could share more details please. Are you using the HuggingFace model and...
Hi, I think we need to combine FSDP + CPU offloading with LoRA to fit the model within e.g. 80GB GPU. LoRA was easy to setup for me with the...
@xiyang-aads-lilly Could you share more details please? Are you using the HugginFace AutoModelForCausalLM wrapper and the HF trainer? For me, setting ```gradient_checkpointing=True``` in the ```TrainingArguments ``` passed to the HF...
Thank you both @xiyang-aads-lilly @kawabata-tomoko for sharing implementation details on gradient checkpointing, I didn't look into details enough to notice that activating gradient checkpointing was doing nothing at all since...
Thanks, if I can properly setup Zero2/3 I will try both. How did you set it up? Did you use the ```accelerate config``` and ```accelerate launch``` as described in https://huggingface.co/docs/accelerate/usage_guides/deepspeed#accelerate-deepspeed-plugin?
Thanks a lot, I really appreciate you sharing all these details! Somehow just setting ```fsdp``` in the ```TrainingArguments``` did not work so I imagine there must be some default params...
thanks everyone for sharing your insights, this was really helpful! I managed to put everything in place and getting nice training results so I will share the setup which is...