pytorch-recurrent-ae-siggraph17 icon indicating copy to clipboard operation
pytorch-recurrent-ae-siggraph17 copied to clipboard

About memory use

Open xin-xinhanggao opened this issue 5 years ago • 4 comments

Hello, I create a little dataset to try the model.

But pytorch reports the error that memory of GPU has run out.

Type of my graphical card is NVIDIA 1080 Ti.

xin-xinhanggao avatar Mar 11 '19 08:03 xin-xinhanggao

Hi @xin-xinhanggao, thank you for your query.

I have also used the NVIDIA 1080 Ti for training the model, so I am not sure why the error is appearing for you. Could you please give more details and paste the exact error message?

PS : Did you change the batch size, in train.py, line 109

dataset = DataLoader(data_loader, batch_size=1, num_workers=0, shuffle=True)

If yes, then change it to 1 again. Since the sequences already contain 7 images, adding more can result in a memory error.

AakashKT avatar Mar 11 '19 08:03 AakashKT

Thank you for your timely reply.

As you said, I change the batch size as 16 to train the model more effectively.

It must be the reason why I failed.

But is there any problem by using batch size as 1, which seems may affect the covergence of the model.

xin-xinhanggao avatar Mar 11 '19 08:03 xin-xinhanggao

Hi, Yes, it definitely affects convergence. But we cannot help it due to resource constraints. You could try on a better GPU if you have one, or use the nn.DataParallel module of pytorch with multiple 1080 Ti.

In the paper, they have used a batch size of 4 sequences, so in any case, 16 is too large. (They trained on an NVIDIA DGX-1 !)

AakashKT avatar Mar 11 '19 09:03 AakashKT

Thanks for your reply :)

xin-xinhanggao avatar Mar 11 '19 09:03 xin-xinhanggao