lama
lama copied to clipboard
Smart way of training on rectangular images
I was wondering if anyone has a smart solution to train on rectangular images? I have images that are unprocessed 180x1440px, however training on 1440x1440px by filling the image with empty space is computationally not possible.
LaMa can work out-of-the box on non-rectangular images, having the paddings and number of downscales set properly.
Are all images in your dataset exactly 180x1440?
Ah ok. Do I have to change any parameters in order for it to work? Yes all my images are exactly 180x1440px.
Thank you!
Do I have to change any parameters in order for it to work?
As a first step, just try - and if it does not crash, then you do not need to change anything.
However, the default training parameters are probably not optimal for your case: experimentation is needed
Thank you very much! Of course, training crashes with following:
Traceback (most recent call last):
File "bin/train.py", line 69, in main
trainer.fit(training_model)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
self.accelerator.start_training(self)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
self.training_type_plugin.start_training(trainer)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
self._results = trainer.run_train()
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 607, in run_train
self.run_sanity_check(self.lightning_module)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in run_sanity_check
_, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 726, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 166, in evaluation_step
output = self.trainer.accelerator.validation_step(args)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in validation_step
return self.training_type_plugin.validation_step(*args)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 315, in validation_step
return self.model(*args, **kwargs)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/pytorch_lightning/overrides/base.py", line 63, in forward
output = self.module.validation_step(*inputs, **kwargs)
File "/p/tmp/user/LAMA/lama/saicinpainting/training/trainers/base.py", line 161, in validation_step
return self._do_step(batch, batch_idx, mode=mode, extra_val_key=extra_val_key)
File "/p/tmp/user/LAMA/lama/saicinpainting/training/trainers/base.py", line 232, in _do_step
batch = self(batch)
File "/p/tmp/user/lama_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/p/tmp/user/LAMA/lama/saicinpainting/training/trainers/default.py", line 72, in forward
batch['inpainted'] = mask * batch['predicted_image'] + (1 - mask) * batch['image']
RuntimeError: The size of tensor a (180) must match the size of tensor b (184) at non-singleton dimension 2
Apparently there is an error in the ffc_resnet generator. Any idea what exactly causes the problem and why batch['predicted_image'] = self.generator(masked_img)
returns the array with dimension 184?
And what training parameters are the most promising for tuning for this use case of rectangular images in your opinion?
This is because 180 is not divisible by 8. The default settings of LaMa generator set number of downsamples = 3 => input data shape must be divisible by 8. The solution is to either reduce number of downsample blocks to 2 or pad your data to 184
training parameters are the most promising
I do not now the nature of your data, so it is hard to suggest something. I'm not sure that perceptual loss and discriminator will do their job well out-of-the box due to unusual resolution:
- PL backbone was trained on larger patches - and I do not know it if will work well on smaller images (180)
- Discriminator needs relatively low spatial receptive field vertically (not more than 180) but it might need huge receptive field horizontally (or might not need, it depends on the nature of the data and size of masks)
Thank you for your help, it is really useful!
Reducing number of down-samples to 2 worked. I had to play a little bit with my mask generation since I implemented my own masks. However, I was wondering if there is a computational difference if lama pads the pictures internally to 1440x1440 compared to me doing it beforehand. At the moment it pads the images and masks from 1440x180 to 1440x1440 and then I run out of memory like I did before (2xGPU each 31.75 GiB total capacity) .
Am I missing something here? Is padding it to 1440x1440 even necessary? For context; I am training on climate data so I would think that I need the whole horizontal extent (1440px) while training to ensure the most physically plausible results.
The model itself does not need that padding. I'd suggest to have a look at augmentations - probably, you need to remove scaling to a fixed square resolution from augs. The output quality depends on augmentations really a lot - so you need to carefuly choose them for your case and remove all the augs, which bring unnecessary distortions which you're unlikely to meet at inference stage.
Thank you! I managed to train and evaluate the model on my own data now, without padding. Even though batch size has to be around 10 and I had to reduce ngf to 32, otherwise I run out of memory. The results are not too bad with the default settings, but there is problem a lot of room for improvement.
You mentioned the discriminator might not be optimal in the out-of-the-box settings. Is it trivial to change the settings in the NLayerDiscriminator
? How would I increase the receptive field horizontally?