finegan icon indicating copy to clipboard operation
finegan copied to clipboard

high res version

Open kidach1 opened this issue 5 years ago • 6 comments

Thank you for sharing. Did you try high res ver (like 256x or 512x)? If not, what difficulties can be considered?

kidach1 avatar Jan 11 '20 08:01 kidach1

Hi,

Yes we did try one version for 256x256 resolution, and it was working decently well. You can simply add more upsampling layers towards the end of all the generator modules. We haven't tried it for 512x512, but it might be tricky to directly generate images at the 512x512 resolution (in FineGAN, the resolution remains same throughout the pipeline). You might need some variant of StackGAN or ProgressiveGAN to reach that resolution.

utkarshojha avatar Jan 14 '20 07:01 utkarshojha

@utkarshojha thank you for reply!

Yes we did try one version for 256x256 resolution, and it was working decently well. You can simply add more upsampling layers towards the end of all the generator modules.

But if generator outputs size become twice, discriminator inputs and real_imgs size also should be twice (otherwise it causes size mismatch error), right? Though I tried to do that, I couldn't get satisfactory results as below.

fake_imgs[0] (background stage)

count_000076000_fake_samples0

fake_imgs[1] (parent stage)

count_000076000_fake_samples1

fake_imgs[2] (child stage)

count_000076000_fake_samples2

It seems that bounding box process doesn't work well and disentaglement of the background fails. Could you share your code for 256x256 if it's possible?

kidach1 avatar Jan 14 '20 11:01 kidach1

One problem which could have been there in your implementation of the 256x256 version would be at your background stage. We use PatchGAN at the background stage, due to which we need to define the values of some hyperparameters, which are defined in lines 381-383 of trainer.py. These parameters are needed to accurately extract patches lying outside the bounding box.

For 256x256 version, the updated values of those parameters would be: self.patch_stride = float(8) self.n_out = 24 self.recp_field = 70 And yes, the real images and the discriminator inputs (and consequently the discriminator itself) would be different, and there isn't anything different we do for the 256 case, apart from adding more layers to process higher resolution inputs.

My version of 256x256 isn't cleaned up, so I don't think it will be helpful to you. You should try the correction I mentioned and let me know if it works. If not then I can further look into it.

utkarshojha avatar Jan 15 '20 01:01 utkarshojha

@utkarshojha I tried following your suggestion but things don't seem to change.

fake_imgs[0] (background stage)

count_000012000_fake_samples0

fake_imgs[1] (parent stage)

count_000012000_fake_samples1

fake_imgs[2] (child stage)

count_000012000_fake_samples2

And this is my changes. https://github.com/kidach1/finegan/commit/e5c8abddcb80340f5703b4675da0df1892b4fb71

Could you check this?

kidach1 avatar Jan 15 '20 07:01 kidach1

Hi kidach1, Sorry for the late response. I went through the changes you made, and it looks fine. The only difference is the CROP_IMG_SIZE parameter. You've set it to 252, while in my version it is 254 (apologies for not mentioning this before). I'm not sure what difference would this make, but you should try it. I've been a lot busy these past few weeks, but you should let me know of the result. Its just that I might be a bit late to respond.

Thanks

utkarshojha avatar Jan 22 '20 02:01 utkarshojha

@utkarshojha Thank you for reply despite your busy schedule! Unfortunately, change of CROP_IMG_SIZE doesn't seem to work (though the training progress is only 25 epochs yet, generated images in each stage are almost the same like above my comments).

Are the channel sizes of G and D (this and this line) same as yours?

kidach1 avatar Jan 22 '20 11:01 kidach1