ai
ai copied to clipboard
StyleGAN2_Colab_Train.ipynb exits abruptly
Hi there,
first of all, thanks for the great work!
I have an issue when running the StyleGAN2_Colab_Train.ipynb notebook on Colab. After downloading the Stylegan2 repo and converting the images to TF records, I run the training script (run_training.py), but the script exits while compling the TensorFlow plugin "upfirdn_2d.cu" (see image).
Any idea of why this happens be and how to fix this?
Did this happen multiple times? This is usually the code given by the stop button to end the script early.
Yes, I've tried at least 10 times to run the training script. It just exits without any further error code or notification. 🤷🏽♂️
Could it have to do with the notebook running out of memory?
Yes, I've tried at least 10 times to run the training script. It just exits without any further error code or notification. 🤷🏽♂️
Could it have to do with the notebook running out of memory?
I'm facing the same problem, thought it was because I was trying to train on 32x32 files but apparently not.
Also when I tried to delete all the pkl files to train from scratch is gives me index out of range problem in pkl.
could anyone solve this this? I also have this problem I have noticed that it consumes all the ram memory It must have a form so that it does not consume as much ram, does anyone have an idea?
could anyone solve this this? I also have this problem I have noticed that it consumes all the ram memory It must have a form so that it does not consume as much ram, does anyone have an idea?
I solved it by not using ffhq database and trainning from start. This way you don't need to load everything to memory and you don't reach the 16gb of memory and run out of collab restrictions. I also made some changes in code to have some features from skyflynil fork while still maining the improvements made by derrick https://github.com/andersonfaaria/stylegan2
try your repository but it gives me the same error, it tells me which is the colab fixed?
You get the error because you're trying to load the pretrained network and it explodes the maximum amount of ram the collab supports. Don't use pre trained network, don't use ffhq at all. Train in smaller dimensions using other datasets and you'll be fine
@molo32 I’ve never seen this... at this particular step its compiling the custom cuda scripts...maybe make sure youre using CUDA 10.x? (I cant imagine you would be using anything else). Maybe also check to make sure you have a P100? I don’t think you’d run into this with a K80 but its possible.