SA-SSD
SA-SSD copied to clipboard
CUDA OOM Error
I am trying to train the SA-SSD model, but I encounter with CUDA out-of-memory error. I tried to use batch size 1, but the OOM error remains. I am using TITAN V with 12.7G memory, and my pytorch version is 1.2.0.
You need to install torch 1.1 btw, but i create a dockerfile which i use to train model - enen on 2060 6gb with bs 1. Also found that on v100 memory usage growths - about 15GB with default settings. @skyhehe123 , did you see this fact before? I didn't know why memory change across GPUs.
You need to install torch 1.1 btw, but i create a dockerfile which i use to train model - enen on 2060 6gb with bs 1. Also found that on v100 memory usage growths - about 15GB with default settings. @skyhehe123 , did you see this fact before? I didn't know why memory change across GPUs.
I install pytorch 1.1.0, but memory is still not enough..
Do you build DOcker image and try train inside?
Do you build DOcker image and try train inside?
I use the DOcker image as a reference because it is slightly different from my environment. I install spconv using the commands in the DOcker image, and install pytorch1.1.0 and torchvision0.3.0. But OOM still remains.
I think I find the problem. It has nothing to do with this repo but spconv, which has bugs on TITAN V GPU. TITAN XP is fine.