MeshCNN icon indicating copy to clipboard operation
MeshCNN copied to clipboard

Training failed on multiple-GPUs

Open sunhuaiqiang opened this issue 5 years ago • 3 comments

I am trying to train sgementation on shrec16 dataset on 4 1080Ti GPUs by setting --gpu_ids=0,1,2,3, but it failed by returning

/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [194,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.

The training process works successful on single GPU

sunhuaiqiang avatar Jan 26 '20 14:01 sunhuaiqiang

Hi @sunhuaiqiang ,

I have actually not tried to run this code on multiple GPUs, I will try to look into fixing it.

ranahanocka avatar Feb 13 '20 13:02 ranahanocka

Does anyone know how to solve this problem? I also meet this problem and the error seems comes from meshconv module.

fishfishson avatar Apr 23 '20 19:04 fishfishson

Does anyone know how to solve this problem?

HanHan55 avatar Jan 06 '21 02:01 HanHan55