panoptic-reconstruction icon indicating copy to clipboard operation
panoptic-reconstruction copied to clipboard

About Training Parallelism

Open macromogic opened this issue 3 years ago • 2 comments

Hi. I wanted to train the model on my own dataset, but I found my cuda memory runs out when processing occupancy_256 prediction. I tried nn.DataParallel to try to run the model on multiple GPUs, but it raises such an error:

AttributeError: 'MinkowskiConvolution' object has no attribute 'dimension'

I searched for this error and found out it was an unresolved issue of MinkowskiEngine (link here). I wonder how you trained the model on your computer, and could you please be kind to inform other possible solutions to make it work? Thank you!

macromogic avatar May 31 '22 14:05 macromogic

The model was trained on a RTX 2080Ti with 11GB memory.

Few things you can check:

  • Increase the number of iterations of lower resolutions are trained (LEVEL_ITERATIONS_64, LEVEL_ITERATIONS_128) before you train the entire model. If your lower resolution predictions are not good enough a lot of voxels could be created on the final resolution, which will require a lot of memory.
  • Increase the masking threshold (SPARSE_THRESHOLD_128, SPARSE_THRESHOLD_256). With that you can control the level of "confidence" an occupied voxel needs to be considered for the next resolution.
  • Generally, the 2D features (80 channels) did not contribute too much to the final performance, but actually require some memory. You can remove that part of the 3D model.

xheon avatar May 31 '22 14:05 xheon

Thanks for the reply! However, I still could not proceed with the level-256 training. I am using the latest version of BlenderProc, so I suspect it was because the format of my generated data is different from yours (which may influence the performance of level-64 and 128). I am still figuring out why.

I inspected the 3D-FRONT dataset and read the code of your forked BlenderProc. There is something I am still wondering about:

  1. In the SegMapRenderer class, you seemed to map the instance to another integer ID (while my data is in float64 type). Does it affect the model performance?
  2. I noticed the dataset contains both raw_model and normalized_model files. How is the normalization performed? Does it have anything to do with geometry data generation?

macromogic avatar Jun 23 '22 16:06 macromogic