lama icon indicating copy to clipboard operation
lama copied to clipboard

Smart way of training on rectangular images

Open NilsBochow opened this issue 2 years ago • 9 comments

I was wondering if anyone has a smart solution to train on rectangular images? I have images that are unprocessed 180x1440px, however training on 1440x1440px by filling the image with empty space is computationally not possible.

NilsBochow avatar Jan 24 '23 10:01 NilsBochow

LaMa can work out-of-the box on non-rectangular images, having the paddings and number of downscales set properly.

Are all images in your dataset exactly 180x1440?

windj007 avatar Jan 25 '23 10:01 windj007

Ah ok. Do I have to change any parameters in order for it to work? Yes all my images are exactly 180x1440px.

Thank you!

NilsBochow avatar Jan 25 '23 11:01 NilsBochow

Do I have to change any parameters in order for it to work?

As a first step, just try - and if it does not crash, then you do not need to change anything.

However, the default training parameters are probably not optimal for your case: experimentation is needed

windj007 avatar Jan 25 '23 14:01 windj007

Thank you very much! Of course, training crashes with following:

Traceback (most recent call last):
  File "bin/train.py", line 69, in main
    trainer.fit(training_model)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
    self.dispatch()
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
    self.accelerator.start_training(self)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
    self._results = trainer.run_train()
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 607, in run_train
    self.run_sanity_check(self.lightning_module)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in run_sanity_check
    _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 726, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 166, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 315, in validation_step
    return self.model(*args, **kwargs)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/overrides/base.py", line 63, in forward
    output = self.module.validation_step(*inputs, **kwargs)
  File "/p/tmp/user/LAMA/lama/saicinpainting/training/trainers/base.py", line 161, in validation_step
    return self._do_step(batch, batch_idx, mode=mode, extra_val_key=extra_val_key)
  File "/p/tmp/user/LAMA/lama/saicinpainting/training/trainers/base.py", line 232, in _do_step
    batch = self(batch)
  File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/p/tmp/user/LAMA/lama/saicinpainting/training/trainers/default.py", line 72, in forward
    batch['inpainted'] = mask * batch['predicted_image'] + (1 - mask) * batch['image']
RuntimeError: The size of tensor a (180) must match the size of tensor b (184) at non-singleton dimension 2

Apparently there is an error in the ffc_resnet generator. Any idea what exactly causes the problem and why batch['predicted_image'] = self.generator(masked_img) returns the array with dimension 184?

And what training parameters are the most promising for tuning for this use case of rectangular images in your opinion?

NilsBochow avatar Jan 27 '23 06:01 NilsBochow

This is because 180 is not divisible by 8. The default settings of LaMa generator set number of downsamples = 3 => input data shape must be divisible by 8. The solution is to either reduce number of downsample blocks to 2 or pad your data to 184

windj007 avatar Jan 27 '23 08:01 windj007

training parameters are the most promising

I do not now the nature of your data, so it is hard to suggest something. I'm not sure that perceptual loss and discriminator will do their job well out-of-the box due to unusual resolution:

  • PL backbone was trained on larger patches - and I do not know it if will work well on smaller images (180)
  • Discriminator needs relatively low spatial receptive field vertically (not more than 180) but it might need huge receptive field horizontally (or might not need, it depends on the nature of the data and size of masks)

windj007 avatar Jan 27 '23 08:01 windj007

Thank you for your help, it is really useful!

Reducing number of down-samples to 2 worked. I had to play a little bit with my mask generation since I implemented my own masks. However, I was wondering if there is a computational difference if lama pads the pictures internally to 1440x1440 compared to me doing it beforehand. At the moment it pads the images and masks from 1440x180 to 1440x1440 and then I run out of memory like I did before (2xGPU each 31.75 GiB total capacity) .

Am I missing something here? Is padding it to 1440x1440 even necessary? For context; I am training on climate data so I would think that I need the whole horizontal extent (1440px) while training to ensure the most physically plausible results.

NilsBochow avatar Jan 27 '23 10:01 NilsBochow

The model itself does not need that padding. I'd suggest to have a look at augmentations - probably, you need to remove scaling to a fixed square resolution from augs. The output quality depends on augmentations really a lot - so you need to carefuly choose them for your case and remove all the augs, which bring unnecessary distortions which you're unlikely to meet at inference stage.

windj007 avatar Jan 27 '23 12:01 windj007

Thank you! I managed to train and evaluate the model on my own data now, without padding. Even though batch size has to be around 10 and I had to reduce ngf to 32, otherwise I run out of memory. The results are not too bad with the default settings, but there is problem a lot of room for improvement.

You mentioned the discriminator might not be optimal in the out-of-the-box settings. Is it trivial to change the settings in the NLayerDiscriminator? How would I increase the receptive field horizontally?

NilsBochow avatar Feb 13 '23 09:02 NilsBochow