SpatioTemporalSegmentation icon indicating copy to clipboard operation
SpatioTemporalSegmentation copied to clipboard

Training time

Open wbhu opened this issue 6 years ago • 4 comments
trafficstars

Hi,

Thanks for sharing the training code. Does the code work for multiple GPUs? If not, how long it takes to get the reported SOTA performance model?

Thanks very much

wbhu avatar Oct 13 '19 14:10 wbhu

Sorry for the late reply.

I haven't measured the entire training time as the server kicks me out after the max wall time of the SLURM. However, each iteration takes about 7.5 seconds on Titan RTX with batch size 9 with 2cm voxel size Mink16UNet34C (42 layers deep network).

We are currently working on making this even faster on the next version of the MinkowskiEngine, which will be released soon.

chrischoy avatar Nov 03 '19 01:11 chrischoy

I posted an entire training log on https://github.com/chrischoy/SpatioTemporalSegmentation/issues/8

In sum, started 09/11 14:59:47 ended 09/16 15:34:33 for 60k and scannet v2 validation every 1k which takes about 7 min each total 420 min.

chrischoy avatar Nov 04 '19 10:11 chrischoy

Thanks very much for the kind reply! BTW, is the log for 5cm voxel available, since it seems too long to train a model for 2cm voxels..

wbhu avatar Nov 05 '19 13:11 wbhu

@chrischoy Could this code use multi gpu? I added this to the main.py: if torch.cuda.device_count() > 1: print("Let's use", torch.cuda.device_count(), "GPUs!") model = torch.nn.DataParallel(model) model = model.to(torch.device("cuda:0"))

but error occurs: RuntimeError: Caught RuntimeError in replica 0 on device 0.

Thank you!

fengziyue avatar Mar 09 '20 18:03 fengziyue