Image-Super-Resolution-via-Iterative-Refinement icon indicating copy to clipboard operation
Image-Super-Resolution-via-Iterative-Refinement copied to clipboard

What params should I use for the 128 to 1024 task, and for the 512 to 1024 task? How should I choose the Unet architecture?

Open roey1rg opened this issue 3 years ago • 3 comments
trafficstars

Specifaclly, I'm asking about the following section in the config file:

"unet": {
    "in_channel": 6,
    "out_channel": 3,
    "inner_channel": 64,
    "channel_multiplier": [?????????],
    "attn_res": [?????????],
    "res_blocks": ?????????,
    "dropout": ?????????
},

roey1rg avatar Jul 29 '22 05:07 roey1rg

I assume you've looked at the hyperparameters used in the paper (https://arxiv.org/pdf/2104.07636.pdf), so for reference, here is what they used: image

Regarding dropout, they mention:

We use a dropout rate of 0.2 for 16×16 → 128×128 models super-resolution, but otherwise, we do not use dropout.

Although your sizes are slightly different, I would imagine you could "interpolate/extrapolate" and choose hyperparameters. It would appear as though the general trend is that more parameters are needed for very small to medium sized images, and less are required for medium to large images.

We use 625M parameters for our 64×64 → {256×256, 512×512} models, 550M parameters for the 16×16 → 128×128 models, and 150M parameters for 256×256 → 1024×1024 model.

If I had to estimate, you could probably use the following:

  • 512x512 -> 1024x1024
    • Channel Dim = 32
    • Depth multipliers = {1, 2, 4, 8, 16, 32, 32}
    • ResNet blocks = 2 # Maybe 3
    • Dropout = 0
  • 128x128 -> 1024x1024
    • Channel Dim = 16 # I haven't seen anything lower than this
    • Depth multipliers = {1, 2, 4, 8, 16, 16, 32, 32, 32} # Might need to play around with this
    • ResNet blocks = 2
    • Dropout = 0

Of course, as with any ML task... hyperparameter tuning is highly task-specific, so you will undoubtedly need to play around with the values

xenova avatar Jul 31 '22 18:07 xenova

Thank you for the detailed answer! I tried to run the 512-to-1024 setup you proposed and I faced a CUDA out of memory problems, so I changed the depth multipliers from [1, 2, 4, 8, 16, 32, 32] to [1, 2, 4, 8, 8, 16, 32] and I managed to start the training. do you think it is reasonable?

I'm using 4 GPUs with 24GB of memory each and I set the batch size to 4 (one image per GPU)

roey1rg avatar Aug 02 '22 08:08 roey1rg

Sure, that sounds reasonable 👍 Let me know how it goes. 👌

xenova avatar Aug 02 '22 13:08 xenova

Feel free to reopen the issue if there is any question.

Janspiry avatar Aug 25 '22 06:08 Janspiry

  • hannel Dim = 32
  • Depth multip

Thanks for explanation !! But I still have a question.

Which parameter represents Channel Dim in this code?

huchi00057 avatar Feb 08 '23 07:02 huchi00057