KAIR
KAIR copied to clipboard
Slow convergence
Hi
Thanks for the great work again.
I am wondering whether there might be something wrong with the learning rate scheduler. I tested "main_train_rrdb_psnr.py", found that the learning rate quickly been scheduled to "1.563e-06" from early stage of training. Thought I checked update_learning_rate and option, it seems ok implementation-wise.
But I observed much slower convergence comparing to implementation from xinntao, Did you observe same phenomenon?
Change https://github.com/cszn/KAIR/blob/3eb3cc7776fa8c57e8ed7c71bfa8039beb4c6677/options/train_msrresnet_psnr.json#L65
The training speed of KAIR is slower than BasicSR by xintao, three possible reasons at least: 1, https://github.com/xinntao/BasicSR/blob/master/docs/DatasetPreparation.md 2, https://github.com/xinntao/BasicSR/blob/14bafa5e03468775544f8711d7da7a61dbb3d664/basicsr/train.py#L13 3, https://github.com/xinntao/BasicSR/blob/14bafa5e03468775544f8711d7da7a61dbb3d664/basicsr/train.py#L35
I checked the G_schedular_milestone, the setting is fine. The data indeed is not aligned for the training, thought I think it's not the major problem here. And I did not use distributed training.
Here is the log I expected learning rate to be near 1e-4, but it's 1.5e-6 through out training phase.
20-09-14 18:25:35.666 : task: rrdb
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: 0
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/rrdb
log: superresolution/rrdb
options: superresolution/rrdb/options
models: superresolution/rrdb/models
images: superresolution/rrdb/images
]
datasets:[
train:[
name: train_dataset
dataset_type: sr
dataroot_H: ./opensource_code/KAIR/trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 16
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: sr
dataroot_H: ./opensource_code/KAIR/testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: rrdb
in_nc: 3
out_nc: 3
nc: 64
nb: 23
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: upconv
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 500
checkpoint_save: 1000
checkpoint_print: 100
]
opt_path: ./opensource_code/KAIR/options/train_rrdb_psnr.json
is_train: True
20-09-14 18:25:35.666 : Random seed: 7237
20-09-14 18:25:35.731 : Number of train images: 800, iters: 50
20-09-14 18:25:40.496 :
Networks name: RRDB
Params number: 16697987
Net structure:
RRDB(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
----------
skip model and weight info here
----------
20-09-14 18:26:55.455 : <epoch: 1, iter: 100, lr:1.563e-06> G_loss: 4.142e-01
20-09-14 18:28:11.919 : <epoch: 3, iter: 200, lr:1.563e-06> G_loss: 3.753e-01
20-09-14 18:29:25.745 : <epoch: 5, iter: 300, lr:1.563e-06> G_loss: 1.823e-01
20-09-14 18:30:39.381 : <epoch: 7, iter: 400, lr:1.563e-06> G_loss: 1.380e-01
20-09-14 18:31:51.148 : <epoch: 9, iter: 500, lr:1.563e-06> G_loss: 1.618e-01
20-09-14 18:31:51.498 : ---1--> baby.bmp | 17.50dB
20-09-14 18:31:51.575 : ---2--> bird.bmp | 15.07dB
20-09-14 18:31:51.772 : ---3--> butterfly.bmp | 12.41dB
20-09-14 18:31:51.880 : ---4--> head.bmp | 20.18dB
20-09-14 18:31:51.991 : ---5--> woman.bmp | 16.58dB
20-09-14 18:31:52.021 : <epoch: 9, iter: 500, Average PSNR : 16.35dB
20-09-14 18:33:03.716 : <epoch: 11, iter: 600, lr:1.563e-06> G_loss: 1.111e-01
20-09-14 18:34:16.212 : <epoch: 13, iter: 700, lr:1.563e-06> G_loss: 1.332e-01
20-09-14 18:35:28.055 : <epoch: 15, iter: 800, lr:1.563e-06> G_loss: 1.334e-01
20-09-14 18:36:40.067 : <epoch: 17, iter: 900, lr:1.563e-06> G_loss: 1.078e-01
20-09-14 18:37:51.909 : <epoch: 19, iter: 1,000, lr:1.563e-06> G_loss: 1.308e-01
20-09-14 18:37:51.911 : Saving the model.
20-09-14 18:37:52.582 : ---1--> baby.bmp | 17.79dB
20-09-14 18:37:52.670 : ---2--> bird.bmp | 15.16dB
20-09-14 18:37:52.738 : ---3--> butterfly.bmp | 12.61dB
20-09-14 18:37:52.870 : ---4--> head.bmp | 20.54dB
20-09-14 18:37:52.938 : ---5--> woman.bmp | 16.97dB
20-09-14 18:37:52.969 : <epoch: 19, iter: 1,000, Average PSNR : 16.62dB
20-09-14 18:39:04.704 : <epoch: 21, iter: 1,100, lr:1.563e-06> G_loss: 1.053e-01
20-09-14 18:40:18.259 : <epoch: 23, iter: 1,200, lr:1.563e-06> G_loss: 1.111e-01
20-09-14 18:41:30.275 : <epoch: 25, iter: 1,300, lr:1.563e-06> G_loss: 1.112e-01
20-09-14 18:42:43.876 : <epoch: 27, iter: 1,400, lr:1.563e-06> G_loss: 1.048e-01
20-09-14 18:43:56.993 : <epoch: 29, iter: 1,500, lr:1.563e-06> G_loss: 1.095e-01
20-09-14 18:43:57.203 : ---1--> baby.bmp | 18.24dB
20-09-14 18:43:57.280 : ---2--> bird.bmp | 15.36dB
20-09-14 18:43:57.349 : ---3--> butterfly.bmp | 13.02dB
20-09-14 18:43:57.416 : ---4--> head.bmp | 20.86dB
20-09-14 18:43:57.493 : ---5--> woman.bmp | 17.45dB
20-09-14 18:43:57.518 : <epoch: 29, iter: 1,500, Average PSNR : 16.99dB