MUNIT Issue with migrating to pytorch 0.4

We discover a couple of issues (slower training speed and degraded output image quality) when migrating our code from pytorch 0.3 to 0.4. We are working on fixing the issues. For now, we recommend that using munit_pytorch0.3.

Jul 27 '18 17:07 mingyuliutw

Speed issue is now fixed in commit f972e42.

Jul 27 '18 23:07 mingyuliutw

After making the changes you did, I get the following error when resuming the training:

Traceback (most recent call last):
  File "train.py", line 64, in <module>
    iterations = trainer.resume(checkpoint_directory, hyperparameters=config) if opts.resume else 0
  File "/devel/MUNIT-master/MUNIT-master/trainer.py", line 186, in resume
    self.gen_a.load_state_dict(state_dict['a'])
  File "/opt/anaconda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for AdaINGen:
        Unexpected key(s) in state_dict: "enc_content.model.0.norm.running_mean", "enc_content.model.0.norm.running_var", "enc_content.model.1.norm.running_mean", "enc_content.model.1.norm.running_var", "enc_content.model.2.norm.running_mean", "enc_content.model.2.norm.running_var", "enc_content.model.3.model.0.model.0.norm.running_mean", "enc_content.model.3.model.0.model.0.norm.running_var", "enc_content.model.3.model.0.model.1.norm.running_mean", "enc_content.model.3.model.0.model.1.norm.running_var", "enc_content.model.3.model.1.model.0.norm.running_mean", "enc_content.model.3.model.1.model.0.norm.running_var", "enc_content.model.3.model.1.model.1.norm.running_mean", "enc_content.model.3.model.1.model.1.norm.running_var", "enc_content.model.3.model.2.model.0.norm.running_mean", "enc_content.model.3.model.2.model.0.norm.running_var", "enc_content.model.3.model.2.model.1.norm.running_mean", "enc_content.model.3.model.2.model.1.norm.running_var", "enc_content.model.3.model.3.model.0.norm.running_mean", "enc_content.model.3.model.3.model.0.norm.running_var", "enc_content.model.3.model.3.model.1.norm.running_mean", "enc_content.model.3.model.3.model.1.norm.running_var".

How can I use already trained model with this modification? Training from scratch is working fine. I guess this issue comes from the changed Layer Normalization?

Do you have any idea why output quality is degraded?

Jul 28 '18 11:07 Cuky88

@Cuky88 The degraded performance resulted from migrating to pytorch 0.4 is likely caused by the instance normalization parameter. We accidentally set track_running_stats=True in networks.py. This means that it will use the tracked means and vars in the test time. However, this is NOT what we used when we developed the code. In the new commit we have set this argument to false. I think this would resolve the issue. I am verifying the hypothesis. Once it is verified, I will add more details.

Jul 28 '18 14:07 mingyuliutw

@mingyuliutw I trained this model for 200,000 iterations several days ago and it took almost 4days. Your work looks so good and I really want to reproduce the results.

How many images in the training set should be a good choice?
How long should I expect it takes when training this model 1M iterations by using new code?

My GPU is Tesla V100-SXM2 16g.

Jul 31 '18 15:07 qilimk

@Cuky88 I used 2500 images as training set and the results looked not so good apparently. I am trying a new dataset which has 50,000 images, hope to get a better result. How about your results? It looks like your iterations are small.

Aug 01 '18 15:08 qilimk

MUNIT MUNIT copied to clipboard

Issue with migrating to pytorch 0.4

MUNIT
MUNIT copied to clipboard