crfill icon indicating copy to clipboard operation
crfill copied to clipboard

Use your weight as the checkpoint

Open mac744mail opened this issue 3 years ago • 10 comments

发现您是中国人,如果不建议的话,那我直接打中文了。经过您之前的指导,我用自己的数据集,运行了您的code。我的数据集是有关于文章背景提取,即先mask掉文章的内容,然后修复提取背景图。

为了试探您的模型在我的数据集上是否是可行的。我做了初步的training, 我暂取了自己数据集中的1000张图片试跑,得到我的参数文件,然后做test。同时,我也直接用您的参数文件,在我的数据集上做test。 我发现,您有一个pretrained的weight,它在我的数据集上直接test的效果非常好。就是运行download.sh之后,有个checkpoints/objrmv/latest_net_G.pth。 我想把您的这个当作checkpoint,resume training from this checkpoint。为了training,似乎latest_net_G.pth是不够的,还需要它对应的 latest_net_D.pth, latest_net_D_aux.pth,以及iter.txt。

您能提供给我吗? 我将很感谢您!

mac744mail avatar Oct 06 '21 21:10 mac744mail

i'v updated the code to allow finetune based on my pretrained models.

use the following command:

./download/download_pretrain.sh
./finetune.sh

you may change the options in finetune.sh to use different hyperparameters or your own dataset

zengxianyu avatar Oct 08 '21 18:10 zengxianyu

i'v updated the code to allow finetune based on my pretrained models.

use the following command:

./download/download_pretrain.sh
./finetune.sh

you may change the options in finetune.sh to use different hyperparameters or your own dataset

many thanks!

mac744mail avatar Oct 08 '21 19:10 mac744mail

Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth.

Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again

Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]).

I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)". Then it worked. However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth.

When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error.

How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ?

The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help?

mac744mail avatar Oct 22 '21 15:10 mac744mail

The newly uploaded snapshots are for finetune only and are not compatible with the test script

------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:33 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9)

Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth.

Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again,?

Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]).

I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)"

#Then it worked, However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth. When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error. #How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ? The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

zengxianyu avatar Oct 22 '21 15:10 zengxianyu

The newly uploaded snapshots are for finetune only and are not compatible with the test script ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:33 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth. Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again,? Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]). I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)" #Then it worked, However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth. When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error. #How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ? The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Can I just directly try to continue training from “objrmv/lates_net_G.pth”? I remember when I did training on my own dataset, if the training was interrupted. I can continue xxx training from the "xxx/latest_net_G.pth", with the corresponding "xxx/latest_net_D_aux.pth" and "xxx/latest_net_D.pth"

Your “objrmv/latest_net_G.pth” worked very well on my dataset. Can I used my own dataset and "continue" training from “objrmv/latest_net_G.pth”? I want to try both finetune and this. Could you please provide “objrmv/latest_net_D.pth” and “objrmv/latest_net_D_aux.pth”? many thanks!

mac744mail avatar Oct 22 '21 15:10 mac744mail

The newly uploaded ones are the same as objrmv/latest*.pth but with extra components required for training (the auxiliary encoder decoder described in the paper). It’s just the format is not compatible with the test script

------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:47 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9)

The newly uploaded snapshots are for finetune only and are not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:33 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth. Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again,? Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]). I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)" #Then it worked, However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth. When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error. #How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ? The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

I'm a little confused on finetune weight and latest weight.

Can I continue training from “objrmv/lates_net_G.pth”? I remember when I did training on my own dataset, if the training was interrupted. I can continue xxx training from the "xxx/lates_net_G.pth", with the corresponding "xxx/lates_net_D_aux.pth" and "xxx/lates_net_D.pth"

Your “objrmv/lates_net_G.pth” worked very well on my dataset. Can I used my owndataset and "continue" training from “objrmv/lates_net_G.pth”? Is it feasible? If so, could you please provide “objrmv/lates_net_D.pth” and “objrmv/lates_net_D_aux.pth”? many thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

zengxianyu avatar Oct 22 '21 15:10 zengxianyu

The newly uploaded ones are the same as objrmv/latest*.pth but with extra components required for training (the auxiliary encoder decoder described in the paper). It’s just the format is not compatible with the test script ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:47 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) The newly uploaded snapshots are for finetune only and are not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:33 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth. Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again,? Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]). I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)" #Then it worked, However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth. When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error. #How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ? The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. I'm a little confused on finetune weight and latest weight. Can I continue training from “objrmv/lates_net_G.pth”? I remember when I did training on my own dataset, if the training was interrupted. I can continue xxx training from the "xxx/lates_net_G.pth", with the corresponding "xxx/lates_net_D_aux.pth" and "xxx/lates_net_D.pth" Your “objrmv/lates_net_G.pth” worked very well on my dataset. Can I used my owndataset and "continue" training from “objrmv/lates_net_G.pth”? Is it feasible? If so, could you please provide “objrmv/lates_net_D.pth” and “objrmv/lates_net_D_aux.pth”? many thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Oh I see what you meant. It's for fine-tune pretrained models of transfer learning, right? My original intention was to not only fine tune the pretrained, but also directly continue the training from “objrmv/lates_net_G.pth”. Just treat it as an "interrupted epoch". Then see what will happen.

mac744mail avatar Oct 22 '21 15:10 mac744mail

You can’t continue the training from “objrmv/lates_net_G.pth as the auxiliary encoder decoder are missing. My newly uploaded ones are the complete version of the  "interrupted epoch"

------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:59 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9)

The newly uploaded ones are the same as objrmv/latest*.pth but with extra components required for training (the auxiliary encoder decoder described in the paper). It’s just the format is not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:47 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) The newly uploaded snapshots are for finetune only and are not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:33 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth. Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again,? Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]). I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)" #Then it worked, However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth. When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error. #How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ? The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. I'm a little confused on finetune weight and latest weight. Can I continue training from “objrmv/lates_net_G.pth”? I remember when I did training on my own dataset, if the training was interrupted. I can continue xxx training from the "xxx/lates_net_G.pth", with the corresponding "xxx/lates_net_D_aux.pth" and "xxx/lates_net_D.pth" Your “objrmv/lates_net_G.pth” worked very well on my dataset. Can I used my owndataset and "continue" training from “objrmv/lates_net_G.pth”? Is it feasible? If so, could you please provide “objrmv/lates_net_D.pth” and “objrmv/lates_net_D_aux.pth”? many thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Oh I see what you meant. It's for fine-tune pretrained models of transfer learning, right? My original intention was to not only fine tune the pretrained, but also directly continue the training from “objrmv/lates_net_G.pth”. Just treat it as an "interrupted epoch". Then to see what will happen.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

zengxianyu avatar Oct 22 '21 16:10 zengxianyu

You can’t continue the training from “objrmv/lates_net_G.pth as the auxiliary encoder decoder are missing. My newly uploaded ones are the complete version of the  "interrupted epoch" ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:59 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) The newly uploaded ones are the same as objrmv/latest*.pth but with extra components required for training (the auxiliary encoder decoder described in the paper). It’s just the format is not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:47 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) The newly uploaded snapshots are for finetune only and are not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:33 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth. Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again,? Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]). I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)" #Then it worked, However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth. When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error. #How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ? The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. I'm a little confused on finetune weight and latest weight. Can I continue training from “objrmv/lates_net_G.pth”? I remember when I did training on my own dataset, if the training was interrupted. I can continue xxx training from the "xxx/lates_net_G.pth", with the corresponding "xxx/lates_net_D_aux.pth" and "xxx/lates_net_D.pth" Your “objrmv/lates_net_G.pth” worked very well on my dataset. Can I used my owndataset and "continue" training from “objrmv/lates_net_G.pth”? Is it feasible? If so, could you please provide “objrmv/lates_net_D.pth” and “objrmv/lates_net_D_aux.pth”? many thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. Oh I see what you meant. It's for fine-tune pretrained models of transfer learning, right? My original intention was to not only fine tune the pretrained, but also directly continue the training from “objrmv/lates_net_G.pth”. Just treat it as an "interrupted epoch". Then to see what will happen. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

well, actually fine-tune the pretrained would be the most appropriate way cuz the datasets are different. Directly resuming the training from a checkpoint of a different dataset does not strictly feasible in many circumstances. Anyway appreciate your help! I will try finetune.sh. thx!

mac744mail avatar Oct 22 '21 16:10 mac744mail

You can’t continue the training from “objrmv/lates_net_G.pth as the auxiliary encoder decoder are missing. My newly uploaded ones are the complete version of the  "interrupted epoch" ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:59 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) The newly uploaded ones are the same as objrmv/latest*.pth but with extra components required for training (the auxiliary encoder decoder described in the paper). It’s just the format is not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:47 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) The newly uploaded snapshots are for finetune only and are not compatible with the test script … ------------------ Original ------------------ From: mac744mail @.> Date: Fri, Oct 22, 2021 11:33 AM To: zengxianyu/crfill @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [zengxianyu/crfill] Use your weight as the checkpoint (#9) Hi I was busy until today. Today I cloned the whole latest repository, I ran ./download/download_pretrain.sh and I got pretrained_net_G.pth, pretrained_net_D.pth, and pretrained_net_D_aux.pth. Then I firstly used "pretrained_net_G.pth" to do testing. Just ran test. sh with "pretrained" epoch. However I got the following error again,? Traceback (most recent call last): File "test.py", line 12, in model = models.create_model(opt) File "/content/drive/My Drive/crfill-new/models/init.py", line 41, in create_model instance = model(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 32, in init self.netG, self.netD = self.initialize_networks(opt) File "/content/drive/My Drive/crfill-new/models/inpaint_model.py", line 110, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "/content/drive/My Drive/crfill-new/util/util.py", line 237, in load_network net.load_state_dict(new_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BaseConvGenerator: size mismatch for conv14.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]). size mismatch for conv16.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 24, 3, 3]). size mismatch for conv16.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([24]). size mismatch for conv17.weight: copying a param with shape torch.Size([3, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 12, 3, 3]). I solved this problem by adding "net = torch.nn.DataParallel(net).cuda()" in the util.py, added it above the line "net.load_state_dict(new_dict, strict=False)" #Then it worked, However, the inpainting results were very bad.... totally different from checkpoints/objrmv/latest_net_G.pth. When I used checkpoints/objrmv/latest_net_G.pth to do testing on my images, the results were super cool and there was no above error. #How come? Is "objrmv_finetune/pretrained_net_G.pth" exactly same as “objrmv/lates_net_G.pth” ? The checkpoint what I need is “objrmv/lates_net_G.pth”. Could you help? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. I'm a little confused on finetune weight and latest weight. Can I continue training from “objrmv/lates_net_G.pth”? I remember when I did training on my own dataset, if the training was interrupted. I can continue xxx training from the "xxx/lates_net_G.pth", with the corresponding "xxx/lates_net_D_aux.pth" and "xxx/lates_net_D.pth" Your “objrmv/lates_net_G.pth” worked very well on my dataset. Can I used my owndataset and "continue" training from “objrmv/lates_net_G.pth”? Is it feasible? If so, could you please provide “objrmv/lates_net_D.pth” and “objrmv/lates_net_D_aux.pth”? many thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. Oh I see what you meant. It's for fine-tune pretrained models of transfer learning, right? My original intention was to not only fine tune the pretrained, but also directly continue the training from “objrmv/lates_net_G.pth”. Just treat it as an "interrupted epoch". Then to see what will happen. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

I want to know where is pix2pix_model

haoren55555 avatar Nov 06 '21 02:11 haoren55555