dcgan.torch icon indicating copy to clipboard operation
dcgan.torch copied to clipboard

64x64 hardwired crop limitation?

Open gwern opened this issue 8 years ago • 26 comments

So I and another were trying out dcgan.torch to see how well it would work on image sets more complicated than faces (kudos on writing an implementation much easier to get up and running than the original dcgan-theano, BTW; we really weren't looking forward to figuring out how to get HDF5 image input working, although some details could use work - like, why is nThreads=1 by default?), and I became concerned that 64x64 images were just too little to convey all the details and would lead to a poorly-trained NN.

Experimenting with the options, it seems that one can get dcgan.torch to work with almost the whole image by setting the full image size to be very similar to that of the crop size: loadSize=65 fineSize=64. Or one could downscale all the images on disk with a command like ls *.jpg | parallel mogrify -resize 65536@. (I am still trying it out but dcgan appears to make much faster progress when trained on almost-full images at 65x65 than when trained on 64x64 crops of full-resolution images.)

The full image still winds up being extremely low resolution, though. Reading through main.lua and donkey_folder.lua is a little confusing. It looks as if we're supposed to be able to increase the size of trained images by increasing fineSize and also the two parameters governing the size of the base layer of the generator & discriminator NNs, so we thought that using better images would be as simple as loadSize=256 fineSize=255 ngf=255 ndf=255 - load a decent-resolution image, crop it minimally, and feed it into the NNs of same size.

But that doesn't work. In fact, we can't find a setting of fineSize other than 64 which doesn't immediately crash dcgan.torch regardless of what we set the other options to. Are we misunderstanding the config options' intent, or is there a bug somewhere?

gwern avatar Dec 15 '15 02:12 gwern

So, the hard-wired 64x64 is in the model. Getting around it is actually trivial. Let me write a bit more detailed post, with code references.

soumith avatar Dec 15 '15 17:12 soumith

I look forward to that. I know that simply increasing net depth/size didn't necessarily help much in past work, but we hope that with more visible detail, maybe that will make a difference. (Perhaps our datasets are too heterogeneous or we're using the wrong hyperparameters, but we're having a hard time getting past fuzzy blobs and getting the fantastic results like the faces/rooms/flowers.)

gwern avatar Dec 15 '15 18:12 gwern

Ok, so the data loader is pretty generic. It has two control variables. opt.loadSize and opt.fineSize here.
Right now, loadSize has to be greater than fineSize (because of a bug in the cropping logic). So it's okay to have loadSize=65 fineSize=64 th main.lua If the dataset if well-aligned (like the faces dataset for example), keeping loadSize = fineSize + 1 will get you better generations, as DCGANs have a hard time understanding and aligning generations.
I think it is because the discriminator is a standard convnet that builds spatial invariances layer by layer. One could build a DCGAN that not only plays the true/fake game on the whole image, but also on local regions. This will make object alignment an explicit objective in the game, and I think it will get you much better generations.

Now, coming to the next part. To do generations of size 128, all you have to do is make the following changes: loadSize=129 fineSize=128 th main.lua And change the generator definition to this:

local netG = nn.Sequential()
-- input is Z, going into a convolution
netG:add(SpatialFullConvolution(nz, ngf * 16, 4, 4))
netG:add(SpatialBatchNormalization(ngf * 16)):add(nn.ReLU(true))
-- state size: (ngf*16) x 4 x 4
netG:add(SpatialFullConvolution(ngf * 16, ngf * 8, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 8)):add(nn.ReLU(true))
-- state size: (ngf*8) x 8 x 8
netG:add(SpatialFullConvolution(ngf * 8, ngf * 4, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 4)):add(nn.ReLU(true))
-- state size: (ngf*4) x 16 x 16
netG:add(SpatialFullConvolution(ngf * 4, ngf * 2, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 2)):add(nn.ReLU(true))
-- state size: (ngf * 2) x 32 x 32
netG:add(SpatialFullConvolution(ngf * 2, ngf, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf)):add(nn.ReLU(true))
-- state size: (ngf) x 64 x 64
netG:add(SpatialFullConvolution(ngf, nc, 4, 4, 2, 2, 1, 1))
netG:add(nn.Tanh())
-- state size: (nc) x 128 x 128

And change the discriminator similarly:

local netD = nn.Sequential()

-- input is (nc) x 128 x 128
netD:add(SpatialConvolution(nc, ndf, 4, 4, 2, 2, 1, 1))
netD:add(nn.LeakyReLU(0.2, true))
-- state size: (ndf) x 64 x 64
netD:add(SpatialConvolution(ndf, ndf * 2, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 2)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*2) x 32 x 32
netD:add(SpatialConvolution(ndf * 2, ndf * 4, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 4)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*4) x 16 x 16
netD:add(SpatialConvolution(ndf * 4, ndf * 8, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 8)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*8) x 8 x 8
netD:add(SpatialConvolution(ndf * 8, ndf * 16, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 16)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*16) x 4 x 4
netD:add(SpatialConvolution(ndf * 16, 1, 4, 4))
netD:add(nn.Sigmoid())
-- state size: 1 x 1 x 1
netD:add(nn.View(1):setNumInputDims(3))
-- state size: 1

You could write a function to essentially generate both the networks automatically for a given generation size, but to keep the code more readable, I defined them manually.

As you see, and unlike what you think, ngf and ndf are note related to the generation size, but they control the number of feature maps in the generator / discriminator.

Hope this helps.

soumith avatar Dec 15 '15 19:12 soumith

And change the generator definition to this:

OK, I see. So to expand it we just need to add another base layer where the argument is the max-size and then we adjust each 'higher' layer to tweak the numbers appropriately. So if we wanted to try out not just 128x128 but 256x256, we would just add another line and tweak accordingly?

(FWIW, I seem to be getting better fuzz from 128x128, but it's too soon for me to be sure that it'll get me nice images in the end when it finishes training. Maybe tomorrow morning I'll know.)

As you see, and unlike what you think, ngf and ndf are note related to the generation size, but they control the number of feature maps in the generator / discriminator.

Oh. I was a little confused because in the Eyescream page, it mentions that if you penalize the discriminator's net size by giving it a fraction of the parameters that the generator gets, you get more stable training (presumably this is because the discriminator has the easier job and too often wins in my runs), and I found that making sure that ngf was >2x ndf did indeed make training much more stable and helped balance error rates.

BTW, I couldn't help but wonder: discriminator vs generator reminds me a lot of actor-critic in reinforcement learning, and seems to have many of the same problems.

Has anyone ever tried to make dcgans stabler by borrowing some of the techniques from there, like freezing networks and having experience-replay buffers? I mean, for example, if D's errors drop to ~0.10, where it's about to collapse and abort training, D's weights could be frozen & no more learning done until G starts to do better at detecting fakes and D's error goes up to something more reasonable like 0.5/1/2; and similarly, if G starts winning almost 100% and is about to reach 0 errors and destroy learning, it could be frozen until D learns enough to start pushing its error rates up to 1. Or a buffer of past images which fooled D could be saved to train on occasionally (to prevent a total collapse when D becomes perfect & wins).

gwern avatar Dec 15 '15 22:12 gwern

It does indeed seem very similar to actor-critic.

There's some trial on doing the network freezing / iterative scheduled optimization. This blog post details some: http://torch.ch/blog/2015/11/13/gan.html#balancing-the-gan-game

I've tried it in eyescream with no luck. It did not help things overall: https://github.com/facebook/eyescream/blob/master/lsun/train.lua#L122-L129

But since eyescream, lots of progress happened. DCGANs might be a good candidate to try this stuff.

soumith avatar Dec 15 '15 23:12 soumith

Tried the resize code posted above for 128x128 and finding that the Discriminator flatlines to 0.0000 around size 10. Anything I might be missing? Changed the discriminator & generator code as well as the command line parameters as specified. Testing 128x128 as a size greater than 64x64 toward eventually trying 320x200. Getting a lot of cool outputs even with the 64x64, thanks for the work on this!

chrisnovello avatar Jul 30 '16 10:07 chrisnovello

You can try changing the learning rates to favor the discriminator, or changing the filter counts, or increasing minibatch size.

Alternately, you could try switching to the new improved-gan in Tensorflow which has additional tricks intended to stabilize training. (It's not that hard to rewrite the Imagenet data processing script to use whatever set of images you may have; just need to edit all the hardwired paths and delete some asserts.) In my experience so far, improved-gan works faster and better but only if you can fit minibatches of at least 32 into your GPU (and probably, ideally 64 or 128) - the catch being that somehow the codebase is wired to assume minibatches which are powers of 2 or something, as anything else crashes and minibatches of 2/4/8/16 diverge almost instantly.

gwern avatar Jul 30 '16 15:07 gwern

I ran into a similar issue with flatlining to zero. Setting ndf to something around ngf/2 or ngf/4 led to stable learning. (That is for 128^2)

LukasMosser avatar Jan 07 '17 18:01 LukasMosser

I also ran into the flatlining issue when trying with 128x128, so I set ndf to ngf/4. The resulting images have a lovely crisp resolution, but after 1000 epochs are very repetitive, nowhere near as much variation as when using 64x64 and keeping ndf and ngf at 64. See the attached. Trying again with ndf at ngf/2. Will report back.

generation2

rjpeart avatar Mar 01 '17 14:03 rjpeart

It was a quick test in the end. With ndf set to ngf/2, training flatlines in 2nd epoch. Any clues as to how I might keep up the variation in the images when using 128x128?

screen shot 2017-03-01 at 10 47 51 pm

rjpeart avatar Mar 01 '17 14:03 rjpeart

@rjpeart Have you tried using any other "tricks" like label smoothing or injecting white noise into the input of the discriminator? That also helped stabilise training for me and is a recommended "fix" for problematic networks." Also how much variation is in your dataset?

LukasMosser avatar Mar 02 '17 19:03 LukasMosser

@LukasMosser Thanks for your response. I was not aware of those tricks so haven't tried (still on that steep learning curve), but I will do, thanks for the leads! I'm using ~950 samples in this dataset, which although not huge has given me great results at 64x64px, so I was surprised at the level of repetition at a higher resolution. I guess it's because of a diminished discriminator?

rjpeart avatar Mar 03 '17 02:03 rjpeart

@LukasMosser adding white noise stabilised the learning perfectly. Thanks so much for your advice. For anyone else struggling with this, here's how I defined the discriminator (it's the code provided by @soumith above, but with white noise added at the 5th line down)

local netD = nn.Sequential()
-- input is (nc) x 128 x 128
netD:add(SpatialConvolution(nc, ndf, 4, 4, 2, 2, 1, 1))
netD:add(nn.LeakyReLU(0.2, true))
--add white noise
netD:add(WhiteNoise(0,0.1))
-- state size: (ndf) x 64 x 64
netD:add(SpatialConvolution(ndf, ndf * 2, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 2)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*2) x 32 x 32
netD:add(SpatialConvolution(ndf * 2, ndf * 4, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 4)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*4) x 16 x 16
netD:add(SpatialConvolution(ndf * 4, ndf * 8, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 8)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*8) x 8 x 8
netD:add(SpatialConvolution(ndf * 8, ndf * 16, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 16)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*16) x 4 x 4
netD:add(SpatialConvolution(ndf * 16, 1, 4, 4))
netD:add(nn.Sigmoid())
-- state size: 1 x 1 x 1
netD:add(nn.View(1):setNumInputDims(3))
-- state size: 1

netD:apply(weights_init)

rjpeart avatar Mar 03 '17 15:03 rjpeart

@rjpeart glad I could help! Also interesting that you add white noise after the first LeakyRelu, I added it before the first convolutional layer and it worked as well, although I believe one can define it in any layer (or all of them) except the last.

Here more tricks: https://github.com/soumith/ganhacks

And here an article why adding noise works (and how): http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/

LukasMosser avatar Mar 03 '17 20:03 LukasMosser

@LukasMosser @rjpeart hey guys! I'm having a hard time adding the WhiteNoise at the discriminator. When I replaced the original discriminator code with @rjpeart modified code, I'm getting this error when training

/home/'myusername'/torch/install/bin/luajit: main2.lua:89: attempt to call global 'WhiteNoise' (a nil value) stack traceback: main2.lua:89: in main chunk [C]: in function 'dofile' ...tion/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Any suggestions? (also on that steep learning curve)

kubmin avatar May 12 '17 09:05 kubmin

@kubmin you probably didn't import the dpnn torch package. https://github.com/Element-Research/dpnn/blob/master/WhiteNoise.lua

Hope that helps!

LukasMosser avatar May 13 '17 11:05 LukasMosser

@LukasMosser thank you! totally didn't import the required package.

cheers!

kubmin avatar May 15 '17 12:05 kubmin

@rjpeart @soumith Did any of you succeed in making larger sizes, perhaps up to 512x512? What would the extra lines in the generator/discriminator definition look like? Thank you!

plugimi avatar Jun 07 '17 05:06 plugimi

@plugimi I got up to 128, but had the repetition problems as stated above. I think you could probably work out what the lines in the generator / discriminator look like by following that pattern. However, I seem to recall reading a thread that mentioned a 512 res would be too computationally intensive to complete. Can't find the thread right now though :/ Maybe @LukasMosser knows?

rjpeart avatar Jun 08 '17 21:06 rjpeart

@plugimi @rjpeart I'm sure you can work out the pattern. The computational side is another, if you manage to fit your models in a GPU, you may have to use labelsmoothing, noise annealing or label flipping to stabilise.

LukasMosser avatar Jun 08 '17 21:06 LukasMosser

Could someone please provide the code for the generator / discriminator nets (in main.py) with dimensions of 256x256? I can't figure it out from the examples - i'm super new to torch :(

robbiebarrat avatar Jun 19 '17 21:06 robbiebarrat

Soumith, if you see this, or anyone else if you know the solution, could you verify if the following is correct for 256x256? I believe I followed the pattern correctly, although am not certain. I was able to get training to work with these changes, but it took significantly longer and even after training for much longer the results were still just fuzzy/static.

Ultimately, my goal is to create much larger AI generated images.

Thanks in advance if you're able to help with this.

For 256x256 I changed the following training config: loadSize=257 fineSize=256 th main.lua

Changes I made to the generator:

  -- input is Z, going into a convolution

  -- changes by John for 256x256
  netG:add(SpatialFullConvolution(nz, ngf * 32, 4, 4))
  netG:add(SpatialBatchNormalization(ngf * 32)):add(nn.ReLU(true))
  netG:add(SpatialFullConvolution(ngf * 32, ngf * 16, 4, 4, 2, 2, 1, 1))
  netG:add(SpatialBatchNormalization(ngf * 16)):add(nn.ReLU(true))
  -- / end changes by John for 256x256

  netG:add(SpatialFullConvolution(ngf * 16, ngf * 8, 4, 4, 2, 2, 1, 1))
  netG:add(SpatialBatchNormalization(ngf * 8)):add(nn.ReLU(true))
  -- state size: (ngf*8) x 8 x 8
  netG:add(SpatialFullConvolution(ngf * 8, ngf * 4, 4, 4, 2, 2, 1, 1))
  netG:add(SpatialBatchNormalization(ngf * 4)):add(nn.ReLU(true))
  -- state size: (ngf*4) x 16 x 16
  netG:add(SpatialFullConvolution(ngf * 4, ngf * 2, 4, 4, 2, 2, 1, 1))
  netG:add(SpatialBatchNormalization(ngf * 2)):add(nn.ReLU(true))
  -- state size: (ngf * 2) x 32 x 32
  netG:add(SpatialFullConvolution(ngf * 2, ngf, 4, 4, 2, 2, 1, 1))
  netG:add(SpatialBatchNormalization(ngf)):add(nn.ReLU(true))
  -- state size: (ngf) x 64 x 64
  netG:add(SpatialFullConvolution(ngf, nc, 4, 4, 2, 2, 1, 1))
  netG:add(nn.Tanh())
  -- state size: (nc) x 128 x 128

And changes I made to the discriminator:

  -- input is (nc) x 128 x 128
netD:add(SpatialConvolution(nc, ndf, 4, 4, 2, 2, 1, 1))
netD:add(nn.LeakyReLU(0.2, true))
-- state size: (ndf) x 64 x 64
netD:add(SpatialConvolution(ndf, ndf * 2, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 2)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*2) x 32 x 32
netD:add(SpatialConvolution(ndf * 2, ndf * 4, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 4)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*4) x 16 x 16
netD:add(SpatialConvolution(ndf * 4, ndf * 8, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 8)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*8) x 8 x 8
netD:add(SpatialConvolution(ndf * 8, ndf * 16, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 16)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*16) x 4 x 4

-- changes by John for 256x256
netD:add(SpatialConvolution(ndf * 16, ndf * 32, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 32)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*16) x 4 x 4
netD:add(SpatialConvolution(ndf * 32, 1, 4, 4))
netD:add(nn.Sigmoid())
-- state size: 1 x 1 x 1
-- / end changes by John for 256x256

netD:add(nn.View(1):setNumInputDims(3))
-- state size: 1

JohnHammell avatar May 22 '19 23:05 JohnHammell

if you want large GANs, look at https://github.com/ajbrock/BigGAN-PyTorch

DCGAN is a bit outdated ;-)

soumith avatar May 28 '19 23:05 soumith

Soumith, thank you so much for the reply and info! Much appreciated!

JohnHammell avatar May 29 '19 21:05 JohnHammell

Hi Soumith, I like the way DCGAN plays with images. I'm trying the code of JohnHammell. It's seems to works. Do you have an idea of change to made for discriminator and determinator to create much larger image : 512x512 ? Thank you for your time

H-O-N-O avatar May 31 '21 15:05 H-O-N-O

Hi JohnHamell, how it works to generate 256x256? DCGAN or Big GAN, want to generate picture with 256x256 and 32x32. A lot tutorial on 32, but less on 256, have you tried your altered architecture and does it work well?

Mrxiba avatar Jul 12 '23 19:07 Mrxiba