FastChat
FastChat copied to clipboard
out of gpu memory using 4xA100 40G
Hi, I used the training script in the readme, and didn't change the data and parameters, but my gpu memory still run out. Have you test it on 4xA100 40Gb? What about the usage of your gpu memory?
yeah, same situation. Even downsize the
--per_device_train_batch_size 1 # original 2
still OOM
Maybe some heroes can solve this using deepspeed?
we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1, the model still consumes about 30G on each card. Thus I think the minimum requirement for training vicuna is 8 cards, 4 cards simply will not do the work.
we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1, the model still consumes about 30G on each card. Thus I think the minimum requirement for training vicuna is 8 cards, 4 cards simply will not do the work.
agree. Btw How about the iteration speed for 8A100 40G @yzxyzh
we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1, the model still consumes about 30G on each card. Thus I think the minimum requirement for training vicuna is 8 cards, 4 cards simply will not do the work.
agree. Btw How about the iteration speed for 8A100 40G @yzxyzh
our speed is about 90s/it.
Closing, as the issue has been resolved.
@yzxyzh You mentioned you are using A100 40G * 8, however the README.md said. said that You can use the following command to train Vicuna-7B with 4 x A100 (40GB).
Is this just a typo?