autotrain-advanced
autotrain-advanced copied to clipboard
[BUG] Distant computer (4 GPUs 10GiB of VRAM each) crashes the second i launch the finetuning using AutoTrain localy of mistral 7B
Prerequisites
- [X] I have read the documentation.
- [X] I have checked other issues for similar problems.
Backend
Local
Interface Used
CLI
CLI Command
autotrain --config config.yml
UI Screenshots & Parameters
This is my CLI when the distant computer crashes :
This is my Config.yml file :
Error Logs
This is the error in BitVise when the computer crashes
Additional Information
Hello Everyone, im a newbie to AutoTrain, im trying to finetune a 7B mistral param using AutoTrain localy (since i got 4 gpus with 10 GiB of Vram each) in a distant computer for which im using BitVise to connect via SSh. i was using the SFT Trainer of Huggingface and faced many issues related to CUDA memory for that reason i decided to move to Autotrain since its easier. N.B : I have done everything i know to optimize (LoRa, 4bit model loading, lower batch size, etc...) i also tried the DDP before (Distributed data paralel). I have found some memory usage estimation of fine tuning this same model in BF16 precision that says the peak of Adam is 56gb of Vram which i dont have for the moment. Do you think renting some GPUs from RunPod or Huggingface would fix the issue for me ? Thank you for your guidance.