MeshCNN
MeshCNN copied to clipboard
Training failed on multiple-GPUs
I am trying to train sgementation on shrec16 dataset on 4 1080Ti GPUs by setting --gpu_ids=0,1,2,3, but it failed by returning
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [194,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize
failed.
The training process works successful on single GPU
Hi @sunhuaiqiang ,
I have actually not tried to run this code on multiple GPUs, I will try to look into fixing it.
Does anyone know how to solve this problem? I also meet this problem and the error seems comes from meshconv module.
Does anyone know how to solve this problem?