ai icon indicating copy to clipboard operation
ai copied to clipboard

StyleGAN2_Colab_Train.ipynb exits abruptly

Open alsino opened this issue 4 years ago • 8 comments

Hi there,

first of all, thanks for the great work!

I have an issue when running the StyleGAN2_Colab_Train.ipynb notebook on Colab. After downloading the Stylegan2 repo and converting the images to TF records, I run the training script (run_training.py), but the script exits while compling the TensorFlow plugin "upfirdn_2d.cu" (see image).

Any idea of why this happens be and how to fix this?

error

alsino avatar May 16 '20 12:05 alsino

Did this happen multiple times? This is usually the code given by the stop button to end the script early.

chrsmlls333 avatar May 16 '20 22:05 chrsmlls333

Yes, I've tried at least 10 times to run the training script. It just exits without any further error code or notification. 🤷🏽‍♂️

Could it have to do with the notebook running out of memory?

alsino avatar May 17 '20 06:05 alsino

Yes, I've tried at least 10 times to run the training script. It just exits without any further error code or notification. 🤷🏽‍♂️

Could it have to do with the notebook running out of memory?

I'm facing the same problem, thought it was because I was trying to train on 32x32 files but apparently not.

Also when I tried to delete all the pkl files to train from scratch is gives me index out of range problem in pkl.

andersonfaaria avatar May 20 '20 20:05 andersonfaaria

could anyone solve this this? I also have this problem I have noticed that it consumes all the ram memory It must have a form so that it does not consume as much ram, does anyone have an idea?

molo32 avatar Jun 22 '20 22:06 molo32

could anyone solve this this? I also have this problem I have noticed that it consumes all the ram memory It must have a form so that it does not consume as much ram, does anyone have an idea?

I solved it by not using ffhq database and trainning from start. This way you don't need to load everything to memory and you don't reach the 16gb of memory and run out of collab restrictions. I also made some changes in code to have some features from skyflynil fork while still maining the improvements made by derrick https://github.com/andersonfaaria/stylegan2

andersonfaaria avatar Jun 23 '20 01:06 andersonfaaria

try your repository but it gives me the same error, it tells me which is the colab fixed?

molo32 avatar Jun 23 '20 19:06 molo32

You get the error because you're trying to load the pretrained network and it explodes the maximum amount of ram the collab supports. Don't use pre trained network, don't use ffhq at all. Train in smaller dimensions using other datasets and you'll be fine

andersonfaaria avatar Jun 23 '20 20:06 andersonfaaria

@molo32 I’ve never seen this... at this particular step its compiling the custom cuda scripts...maybe make sure youre using CUDA 10.x? (I cant imagine you would be using anything else). Maybe also check to make sure you have a P100? I don’t think you’d run into this with a K80 but its possible.

dvschultz avatar Jun 29 '20 22:06 dvschultz