SoftGroup
SoftGroup copied to clipboard
process killed by computer
Hello, when I run the command
./tools/dist_train.sh configs/softgroup_scannet.yaml 1
I met the following problem: my process got killed by my computer after running several epochs.
I searched for the issue, and found it was caused by oom.
I was using the single 3090 GPU, and set batchsize=4, num_workers=4, and I think it shouldd't cause out of memory, noting that it can run some epochs.
Do u konw why and how to deal with the issue?
Hoping for your reply, many thanks!
I also encountered the same problem. Running the test on a single RTX3090 24GB shows cuda error: an illegal memory access was encountered.
I am not very sure. Could you check with --skip-validation
flag. You can also resume training with --resume
Thank you for your advice. But I have no intention of training and want to test on point cloud data.
@wsk12345 which dataset are you using
@thangvubk Thank you for your prompt reply. I am using a custom dataset. A single scene has about 5e6 points.
I ran into "illegal memory access error", it was caused by the radius being too large for the dataset I was training on, it may also have the same effect while testing
If you have memory errors with custom datasets, i suggest checking the input spatial_shape. Spconv2 may not support the input with too large spatial shapes (e.g., 3000x3000x1000).
I am getting the OOM error while testing for S3DIS dataset on a single RTX6000.