pytorch-recurrent-ae-siggraph17
pytorch-recurrent-ae-siggraph17 copied to clipboard
About memory use
Hello, I create a little dataset to try the model.
But pytorch reports the error that memory of GPU has run out.
Type of my graphical card is NVIDIA 1080 Ti.
Hi @xin-xinhanggao, thank you for your query.
I have also used the NVIDIA 1080 Ti for training the model, so I am not sure why the error is appearing for you. Could you please give more details and paste the exact error message?
PS : Did you change the batch size, in train.py, line 109
dataset = DataLoader(data_loader, batch_size=1, num_workers=0, shuffle=True)
If yes, then change it to 1 again. Since the sequences already contain 7 images, adding more can result in a memory error.
Thank you for your timely reply.
As you said, I change the batch size as 16 to train the model more effectively.
It must be the reason why I failed.
But is there any problem by using batch size as 1, which seems may affect the covergence of the model.
Hi, Yes, it definitely affects convergence. But we cannot help it due to resource constraints. You could try on a better GPU if you have one, or use the nn.DataParallel module of pytorch with multiple 1080 Ti.
In the paper, they have used a batch size of 4 sequences, so in any case, 16 is too large. (They trained on an NVIDIA DGX-1 !)
Thanks for your reply :)