Background-Matting icon indicating copy to clipboard operation
Background-Matting copied to clipboard

Question: Why the multi frame feature was not used? Typo?

Open zoezhou1999 opened this issue 4 years ago • 11 comments

Hi, In the networks.py, multi_feat is not used at all? Thank you so much! Should I change that to oth_feat=torch.cat([self.comb_back(torch.cat([img_feat,back_feat],dim=1)),self.comb_seg(torch.cat([img_feat,seg_feat],dim=1)),self.comb_multi(torch.cat([img_feat,multi_feat],dim=1))],dim=1)

def forward(self, image,back,seg,multi):
		img_feat1=self.model_enc1(image)
		img_feat=self.model_enc2(img_feat1)

		back_feat=self.model_enc_back(back)
		seg_feat=self.model_enc_seg(seg)
		multi_feat=self.model_enc_multi(multi)

oth_feat=torch.cat([self.comb_back(torch.cat([img_feat,back_feat],dim=1)),self.comb_seg(torch.cat([img_feat,seg_feat],dim=1)),self.comb_multi(torch.cat([img_feat,back_feat],dim=1))],dim=1)

		out_dec=self.model_res_dec(torch.cat([img_feat,oth_feat],dim=1))

		out_dec_al=self.model_res_dec_al(out_dec)
		al_out=self.model_al_out(out_dec_al)

		out_dec_fg=self.model_res_dec_fg(out_dec)
		out_dec_fg1=self.model_dec_fg1(out_dec_fg)
		fg_out=self.model_fg_out(torch.cat([out_dec_fg1,img_feat1],dim=1))

zoezhou1999 avatar May 08 '20 08:05 zoezhou1999

In the paper, multi-frame features are used.

zoezhou1999 avatar May 08 '20 08:05 zoezhou1999

It appears that there is a bug in the code which resulted in multi-frame features not being used at all. This was the model also used for producing results in the paper. Thus multi-frame feature was not used and we will update the paper to reflect that fact (although for all comparisons in the paper we mentioned that we disable motion cues).

senguptaumd avatar May 08 '20 10:05 senguptaumd

Hi, can I ask if, it means that with motion cues, you did not do experiments?

zoezhou1999 avatar May 08 '20 10:05 zoezhou1999

Hi, "we disable motion cues" means that you used the code in the GitHub to train the model and make the comparison, right? Thank you~

zoezhou1999 avatar May 08 '20 10:05 zoezhou1999

The figure 5 in the paper is not valid, but the rest is fine. I just recently did the experiments after correctly using the motion cues and I did not find any improvement. “We disable motion cues” in the paper means we used M=(I,I,I,I) as input. As it turns out it was never used. Yes we used the code in the github for comparison reported in the paper.

senguptaumd avatar May 08 '20 11:05 senguptaumd

Thank you for the reply! And great work!

zoezhou1999 avatar May 08 '20 11:05 zoezhou1999

@senguptaumd Hi, in your release model, for example: /Background-Matting/Models/real-hand-held/netG_epoch_12.pth. Did you train the release model with or without "motion cues"?

mozpp avatar May 13 '20 02:05 mozpp

The trained model does not use motion cues. However, due to the bug in the networks.py file, you will require to input something as motion cue (Bx4xWxH). You can use any random input, won't matter as the network do not actually utilize it. I will update the code to remove the need for inputting motion cues.

senguptaumd avatar May 13 '20 03:05 senguptaumd

The figure 5 in the paper is not valid, but the rest is fine. I just recently did the experiments after correctly using the motion cues and I did not find any improvement. “We disable motion cues” in the paper means we used M=(I,I,I,I) as input. As it turns out it was never used. Yes we used the code in the github for comparison reported in the paper.

Thanks for your reply. I still have a doubt that what settings are used to produce the comparison in figure 5 since the motion cues are not used.

hejm37 avatar May 13 '20 05:05 hejm37

Due to the bug, it uses background features twice instead of the motion features. oth_feat=torch.cat([self.comb_back(torch.cat([img_feat,back_feat],dim=1)),self.comb_seg(torch.cat([img_feat,seg_feat],dim=1)),self.comb_multi(torch.cat([img_feat,back_feat],dim=1))],dim=1) In Fig 5, since we disabled motion features, we actually used background feature only once: oth_feat=torch.cat([self.comb_back(torch.cat([img_feat,back_feat],dim=1)),self.comb_seg(torch.cat([img_feat,seg_feat],dim=1))],dim=1) The difference in results is probably just from the randomness in training, and it was 1 out of 100 examples where we noticed some differences. In a video sense, it was almost similar to be honest.

senguptaumd avatar May 13 '20 05:05 senguptaumd

The trained model does not use motion cues. However, due to the bug in the networks.py file, you will require to input something as motion cue (Bx4xWxH). You can use any random input, won't matter as the network do not actually utilize it. I will update the code to remove the need for inputting motion cues.

Just curiosity, I think, when some parameter(for example: parameter of model_enc_multi) doesn't get gradient, pytorch will raise error. For now, I don't have time to train, to prove my opinion.

mozpp avatar May 13 '20 07:05 mozpp