stylegan2 icon indicating copy to clipboard operation
stylegan2 copied to clipboard

traning stop after 4 ticks

Open b4nn3d opened this issue 5 years ago • 7 comments

hello there, i got your fork running on colab - semi fine. like i said in the titled, the training stop after 4 ticks

tick 0 kimg 0.1 lod 0.00 minibatch 32 time 58s sec/tick 57.7 sec/kimg 450.76 maintenance 0.0 gpumem 5.1 tick 1 kimg 6.1 lod 0.00 minibatch 32 time 12m 16s sec/tick 648.1 sec/kimg 107.73 maintenance 30.5 gpumem 5.1 tick 2 kimg 12.2 lod 0.00 minibatch 32 time 23m 19s sec/tick 644.3 sec/kimg 107.10 maintenance 18.0 gpumem 5.1 tick 3 kimg 18.2 lod 0.00 minibatch 32 time 34m 16s sec/tick 652.4 sec/kimg 108.45 maintenance 5.1 gpumem 5.1 ^C

^c like a keyboard interrupt.. but i didn't give such a command

b4nn3d avatar Dec 24 '19 21:12 b4nn3d

Did you set the 'metric' to be none? There could be issues if you are running fid metric evaluation. I don't need metric thus I did not do any testing or code change for it. btw, I am able to train through google colab for > 10 ticks

skyflynil avatar Dec 24 '19 21:12 skyflynil

I launched the training with this.

!python run_training.py --result-dir=results --data-dir=datasets --dataset=blow --config=config-f --total-kimg=12000 --mirror-augment=true --metric=none --min-h=3 --min-w=3 --res-log2=7

b4nn3d avatar Dec 24 '19 21:12 b4nn3d

Could be memory issue. You may try this to boost your instance memory. https://github.com/googlecolab/colabtools/issues/253

skyflynil avatar Dec 24 '19 22:12 skyflynil

i got OOM when i was trying with a 512512 dataset. this one was 384384. in your example you train a 640x384 dataset, so i don't see how this could be a problem ;)

btw, i'm trying with 18764 images.. how big is your dataset?

b4nn3d avatar Dec 24 '19 23:12 b4nn3d

I actually did use that high memory instance (25G memory) to train. I have tried 512x512 and 640x384 and both were running fine (around 25k files).

skyflynil avatar Dec 24 '19 23:12 skyflynil

ok, it was a memory issue. trained for 220 ticks with your method

b4nn3d avatar Dec 25 '19 09:12 b4nn3d

Hi there, @b4nn3d did this work out for you?

  • I'm on a high memory instance.
  • 2k images
  • 256^2 dimensions

Launching with: !python run_training.py --num-gpus=1 --data-dir=./dataset --config=config-f --dataset=myset --mirror-augment=true --metric=none --total-kimg=2000 --min-h=4 --min-w=4 --res-log2=6

So far I've never seen more than tick 0: tick 0 kimg 0.1 lod 0.00 minibatch 32 time 41s sec/tick 41.0 sec/kimg 320.50 maintenance 0.0 gpumem 6.1

Suggestions appreciated, cheers.

jwb95 avatar Feb 23 '20 20:02 jwb95