gansformer
gansformer copied to clipboard
CUDA_ERROR_OUT_OF_MEMORY
I always get out of memory errors even when using all defaults and training low resolution. 8 * V100 16GB
Trying to train on 1 GPU I get stuck here:
File "/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py", line 577, in expand_dims
if axis > a.ndim or axis < -a.ndim - 1:
TypeError: '>' not supported between instances of 'list' and 'int'
Hi, Thank you for the interest in the work! I have couple deadlines over the next days so will definitely try to get back to you by the end of the week!
How many GPUs are required to train at least ?
Hi, most sincere apologies for not getting back to it earlier! The model can be trained by even a single GPU.
On which line of the code did you get the error? Did you make changes by any chance in the implementation? The error seems to potentially indicate some small bug so further information could be helpful.
Couple more points:
- To train on 8 gpus basically you need to pass --gpus 0,1,2,3,4,5,6,7 (make sure to not pass e.g. --gpus 8).
- consider using
--batch-gpu
with a lower value, like e.g. 1 to fit the model training into the GPU
I hope one of these might resolve the issue!