gansformer icon indicating copy to clipboard operation
gansformer copied to clipboard

CUDA_ERROR_OUT_OF_MEMORY

Open petergerten opened this issue 3 years ago • 4 comments

I always get out of memory errors even when using all defaults and training low resolution. 8 * V100 16GB

petergerten avatar Jun 21 '21 13:06 petergerten

Trying to train on 1 GPU I get stuck here:


  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py", line 577, in expand_dims
    if axis > a.ndim or axis < -a.ndim - 1:
TypeError: '>' not supported between instances of 'list' and 'int'

petergerten avatar Jun 21 '21 13:06 petergerten

Hi, Thank you for the interest in the work! I have couple deadlines over the next days so will definitely try to get back to you by the end of the week!

dorarad avatar Jun 23 '21 12:06 dorarad

How many GPUs are required to train at least ?

chalure avatar Jul 03 '21 01:07 chalure

Hi, most sincere apologies for not getting back to it earlier! The model can be trained by even a single GPU.

On which line of the code did you get the error? Did you make changes by any chance in the implementation? The error seems to potentially indicate some small bug so further information could be helpful.

Couple more points:

  • To train on 8 gpus basically you need to pass --gpus 0,1,2,3,4,5,6,7 (make sure to not pass e.g. --gpus 8).
  • consider using --batch-gpu with a lower value, like e.g. 1 to fit the model training into the GPU

I hope one of these might resolve the issue!

dorarad avatar Feb 03 '22 00:02 dorarad