snca.pytorch icon indicating copy to clipboard operation
snca.pytorch copied to clipboard

what would be the best way to do multi-gpu training for snca?

Open pallashadow opened this issue 5 years ago • 3 comments

Shall I do nn.parallel.DistributedDataParallel for for both "model" and "lemniscate" and mannually sync each lemniscate memory per epoch?

I think a better solution might be use a single GPU for "lemniscate" memory and calculation, while other GPUs for data parallel "model" part?

I am doing a comics image retrieval task, and find this project very useful. Thank you for you help.

pallashadow avatar Feb 28 '19 10:02 pallashadow

Hi, I am also thinking about the memory issue, how did you deal with it?

RookieHong avatar May 22 '21 15:05 RookieHong

I put the memory forward and backward on CPU, when the memory became large. it was a little bit slower but worked well.

model_forward_gpu -> memory_forward_cpu -> loss_cpu -> memory_backward_cpu -> model_backward_gpu

pallashadow avatar May 24 '21 15:05 pallashadow

Thanks for sharing!

RookieHong avatar May 25 '21 04:05 RookieHong