LISA
LISA copied to clipboard
training on fewer gpus
Hi!
I am trying to train with 4 (24GB) GPU cards instead of 8 as suggested, but the code seems to fail always at the line : https://github.com/dvlab-research/LISA/blob/main/train_ds.py#L305
with the error CUDA OOM, how can i reconfigure the settings in order to enable slower training with fewer GPUs?
Thanks!
Can you lower the batch_size and then increase the grad_accumulation_steps to make sure their product keeps the same?
I have tried, but seems like the code crashes at the same place everytime
https://github.com/dvlab-research/LISA/blob/main/train_ds.py#L305
I have tried, but seems like the code crashes at the same place everytime
https://github.com/dvlab-research/LISA/blob/main/train_ds.py#L305
I have the same problem, after change the sam huge model to big model, the oom is fixed.
I added offloading to cpu and it helped
I have the same problem, has anyone solved it?
I added offloading to cpu and it helped
Please, how do you offload to cpu to solve it? I try to use the following code but OOM again.
"offload_optimizer": {
"device": "cpu",
"pin_memory": True
},
"offload_param": {
"device": "cpu",
"pin_memory": True
},