FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

out of gpu memory using 4xA100 40G

Open puppet101 opened this issue 1 year ago • 4 comments

Hi, I used the training script in the readme, and didn't change the data and parameters, but my gpu memory still run out. Have you test it on 4xA100 40Gb? What about the usage of your gpu memory?

puppet101 avatar Apr 13 '23 11:04 puppet101

yeah, same situation. Even downsize the

    --per_device_train_batch_size 1  # original 2

still OOM

Maybe some heroes can solve this using deepspeed?

CiaoHe avatar Apr 15 '23 05:04 CiaoHe

we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1, the model still consumes about 30G on each card. Thus I think the minimum requirement for training vicuna is 8 cards, 4 cards simply will not do the work.

yzxyzh avatar Apr 15 '23 06:04 yzxyzh

we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1, the model still consumes about 30G on each card. Thus I think the minimum requirement for training vicuna is 8 cards, 4 cards simply will not do the work.

agree. Btw How about the iteration speed for 8A100 40G @yzxyzh

CiaoHe avatar Apr 15 '23 06:04 CiaoHe

we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1, the model still consumes about 30G on each card. Thus I think the minimum requirement for training vicuna is 8 cards, 4 cards simply will not do the work.

agree. Btw How about the iteration speed for 8A100 40G @yzxyzh

our speed is about 90s/it.

yzxyzh avatar Apr 15 '23 11:04 yzxyzh

Closing, as the issue has been resolved.

zhisbug avatar May 08 '23 07:05 zhisbug

@yzxyzh You mentioned you are using A100 40G * 8, however the README.md said. said that You can use the following command to train Vicuna-7B with 4 x A100 (40GB).

Is this just a typo?

ryusaeba avatar May 16 '23 17:05 ryusaeba