FastChat
FastChat copied to clipboard
Can I finetune the 7B model by using 8*3090 GPU ?
The answer is yes. but training is very slow and the max length should less than 512 (512 with fp16 and lora r=8 will OOM)
Based on my own testing, I recommend disabling flash_attention as they haven't fixed the sm86 bug of rtx3090 in the backward phase with hidden dim 128. Additionally, I found that using a max length of 256 worked better for me than the previously recommended max length of 512. I hope this information is helpful to others who may be experiencing similar issues.
Thank you for your answer.
@ericzhou571 I try to finetune the 7B model by using 4*3090 GPU and I set batch size to 1 and max length to 256. However, I still got OOM. Could you please tell me more about how you finetune the 7B model (parameters or other optimizations)?