EDVR copied to clipboard
RuntimeError: Error(s) in loading state_dict for EDVR
Hi everyone, I have trained the EDVR with this log for debluring task:
19-11-04 13:38:42.765 - INFO: name: 001_EDVRwoTSA_scratch_lr4e-4_600k_REDS_LrCAR4S use_tb_logger: True model: video_base distortion: sr scale: 4 gpu_ids: [0] datasets:[ train:[ name: REDS mode: REDS interval_list: [1] random_reverse: False border_mode: False dataroot_GT: /media/ml/datadrive2/Nasrin/EDVR-master/dataset/REDS/train_sharp_wval.lmdb dataroot_LQ: /media/ml/datadrive2/Nasrin/EDVR-master/dataset/REDS/train_blur_wval.lmdb cache_keys: None N_frames: 5 use_shuffle: True n_workers: 3 batch_size: 16 GT_size: 256 LQ_size: 256 use_flip: True use_rot: True color: RGB phase: train scale: 4 data_type: lmdb ] ] network_G:[ which_model_G: EDVR nf: 64 nframes: 5 groups: 8 front_RBs: 5 back_RBs: 10 predeblur: True HR_in: True w_TSA: True scale: 4 ] path:[ pretrain_model_G: None strict_load: True resume_state: None root: /media/ml/datadrive2/Nasrin/EDVR-master experiments_root: /media/ml/datadrive2/Nasrin/EDVR-master/experiments/001_EDVRwoTSA_scratch_lr4e-4_600k_REDS_LrCAR4S models: /media/ml/datadrive2/Nasrin/EDVR-master/experiments/001_EDVRwoTSA_scratch_lr4e-4_600k_REDS_LrCAR4S/models training_state: /media/ml/datadrive2/Nasrin/EDVR-master/experiments/001_EDVRwoTSA_scratch_lr4e-4_600k_REDS_LrCAR4S/training_state log: /media/ml/datadrive2/Nasrin/EDVR-master/experiments/001_EDVRwoTSA_scratch_lr4e-4_600k_REDS_LrCAR4S val_images: /media/ml/datadrive2/Nasrin/EDVR-master/experiments/001_EDVRwoTSA_scratch_lr4e-4_600k_REDS_LrCAR4S/val_images ] train:[ lr_G: 0.0004 lr_scheme: CosineAnnealingLR_Restart beta1: 0.9 beta2: 0.99 niter: 600000 warmup_iter: -1 T_period: [150000, 150000, 150000, 150000] restarts: [150000, 300000, 450000] restart_weights: [1, 1, 1] eta_min: 1e-07 pixel_criterion: cb pixel_weight: 1.0 val_freq: 2000.0 manual_seed: 0 ] logger:[ print_freq: 10 save_checkpoint_freq: 2000.0 ] is_train: True dist: False
when I try to test the trained model on REDS4, this error occur :
(nasrin) ml@ml-HP-Z820-Workstation:/media/ml/datadrive2/Nasrin/EDVR-master/codes$ python test_Vid4_REDS4_with_GT.py
19-11-25 13:13:16.573 - INFO: Data: blur - /media/ml/datadrive2/Nasrin/EDVR-master/dataset/REDS4/blur
19-11-25 13:13:16.573 - INFO: Padding mode: replicate
19-11-25 13:13:16.573 - INFO: Model path: /media/ml/datadrive2/Nasrin/EDVR-master/experiments/001_EDVRwoTSA_scratch_lr4e-4_600k_REDS_LrCAR4S/models/latest_G.pth
19-11-25 13:13:16.573 - INFO: Save images: True
19-11-25 13:13:16.573 - INFO: Flip test: False
Traceback (most recent call last):
File "test_Vid4_REDS4_with_GT.py", line 208, in
I could not figure out what is wrong with it. Could you please help me to solve this problem? @xinntao
From the log,
You train the model with configuration back_RBs: 10
, however, in the test configuration, you probably set back RBs to 40.
Dear @xinntao
You are right, thank you very much. The problem was solved by your suggestion. Actually, I have chosen the EDVR project as my course project and I want to understand the method and get the results in your paper ( or as close as possible). Now I want to train a full EDVR for SR step by step according to your suggestion in Issue #91 ( to avoid unstable offset mean): C64B10woTSA -> C128B10woTSA -> C128B40woTSA -> C128B40wTSA but when I use the trained model of step one as the pre-trained model of step two, because of different nf again I face this error:
RuntimeError: Error(s) in loading state_dict for EDVR: size mismatch for conv_first.weight: copying a param with shape torch.Size([64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 3, 3, 3])....
In the log of both steps I used (strict_load: True), I can not figure out which parts I should manipulate to match the dimensions. I apologize if I ask stupid questions because I am new to python language and understanding your code is a little hard for me. Thank you very much for your time and help.
Best regards, Nasrin
On Fri, Nov 29, 2019 at 4:59 PM Xintao [email protected] wrote:
From the log, You train the model with configuration back_RBs: 10, however, in the test configuration, you probably set back RBs to 40.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xinntao/EDVR/issues/123?email_source=notifications&email_token=ANUIMR4ZFYSRFXZSZNJXYULQWEN5XA5CNFSM4JRJCBV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFO5VII#issuecomment-559798945, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANUIMR7MLXQ25HCA76RNF43QWEN5XANCNFSM4JRJCBVQ .
I think in the yml file you should change 'strict_load' to false, that should resolve this issue.
Dear @TouqeerAhmad @xinntao Thank you very much for your suggestion. I changed the strict_load to false to prevent the mismatch, in both models(nf=64 and nf=128), but it didn't solve the problem. Do you have any other ideas?
Traceback (most recent call last):
File "train.py", line 310, in
Probably it due to the fact that in test scripts there is EDVR model defined with different parameters than those that you used in yaml file.
For example in test_Vid4_REDS4_with_GT.py it is defined as:
model = EDVR_arch.EDVR(128, N_in, 8, 5, back_RBs, predeblur=predeblur, HR_in=HR_in)
You should change these parameters to fit your yaml file.