question about encode
Hi, could you please describe what exactly is done and what's the output of encode in IFNet.forward()
f0 = self.encode(img0[:, :3])
f1 = self.encode(img1[:, :3])
It outputs tensor of size (1,8,width,height) but what exactly this tensor represents?
My detailed ideas can be seen in https://arxiv.org/abs/2310.17294. Warping features is better than only warping images, refer to Context-aware synthesis for video frame interpolation It is an encoder learned by the model independently. I don't know how to explain what exactly it does.
okay, thank you. Another thing, is there any paper which further explain how mask is estimated? I am talking about mask which is used for forward and backward warped images merge into final output?
You may refer to Superslomo