CVR icon indicating copy to clipboard operation
CVR copied to clipboard

pwc*20

Open DelinQu opened this issue 2 years ago • 1 comments

Hi Bin Fan, Thanks for your sharing very much! I noticed that the output of PWCNet is multiplied by 20, which is very puzzling. What is the purpose of this? I've tried comparing the output of PWCNet with other optical flow networks and found that they are not on the same scale! :D

image

pwc output (without *20):

==== flow_up torch.Size([1, 2, 448, 640])
Saving optical flow visualisation at imgs/carla/PWC/flo.png
tensor([[[[ 0.0045,  0.0045, -0.0192,  ..., -0.6218, -0.6173, -0.6173],
          [ 0.0045,  0.0045, -0.0192,  ..., -0.6218, -0.6173, -0.6173],
          [ 0.0065,  0.0065, -0.0175,  ..., -0.6301, -0.6262, -0.6262],
          ...,
          [ 0.0340,  0.0340,  0.0138,  ..., -0.1368, -0.1459, -0.1459],
          [ 0.0300,  0.0300,  0.0103,  ..., -0.1405, -0.1501, -0.1501],
          [ 0.0300,  0.0300,  0.0103,  ..., -0.1405, -0.1501, -0.1501]],

         [[ 0.2731,  0.2731,  0.2756,  ...,  0.2486,  0.2470,  0.2470],
          [ 0.2731,  0.2731,  0.2756,  ...,  0.2486,  0.2470,  0.2470],
          [ 0.2631,  0.2631,  0.2653,  ...,  0.2476,  0.2458,  0.2458],
          ...,
          [-0.0376, -0.0376, -0.0432,  ...,  0.0354,  0.0385,  0.0385],
          [-0.0579, -0.0579, -0.0637,  ...,  0.0088,  0.0123,  0.0123],
          [-0.0579, -0.0579, -0.0637,  ...,  0.0088,  0.0123,  0.0123]]]],
       device='cuda:0')

another optical flow net output:

==== flow_up torch.Size([1, 2, 448, 640])
Saving optical flow visualisation at imgs/carla/GMA/flo.png
tensor([[[[-17.2308, -17.2340, -17.2305,  ..., -15.9215, -15.9945, -16.0019],
          [-17.2351, -17.2356, -17.2337,  ..., -15.9111, -15.9853, -15.9996],
          [-17.2368, -17.2394, -17.2384,  ..., -15.8947, -15.9163, -15.8852],
          ...,
          [-33.6664, -33.6749, -33.6350,  ...,   0.3346,   0.3482,   0.3626],
          [-33.7613, -33.7798, -33.7527,  ...,   0.3555,   0.3691,   0.3913],
          [-33.8094, -33.7940, -33.7729,  ...,   0.3856,   0.3938,   0.4079]],

         [[  0.6522,   0.6320,   0.6238,  ...,   2.7399,   2.7685,   2.7696],
          [  0.6430,   0.6277,   0.6236,  ...,   2.7190,   2.7285,   2.7395],
          [  0.6423,   0.6294,   0.6271,  ...,   2.6887,   2.6911,   2.7036],
          ...,
          [ 31.4061,  31.4198,  31.4066,  ...,  11.1851,  11.1757,  11.1575],
          [ 31.5238,  31.5329,  31.5274,  ...,  11.2173,  11.2108,  11.2045],
          [ 31.5766,  31.5558,  31.5624,  ...,  11.2555,  11.2293,  11.2038]]]],
       device='cuda:0')

DelinQu avatar Dec 02 '22 12:12 DelinQu

乘以20这个是pwcnet里面的设置,我的理解是可能为了处理大一些的运动吧,所以使得网络直接估计的值没那么大。我们保持着乘以20也才能更好的使用pwcnet的预训练模型。 关于其他的光流估计结果和pwcnet不太一致我没有做过深入研究。我只是会觉得pwcnet估计的光流相比raft会差一些。但是我们这儿主要关注于图像恢复,光流的好坏可能对最终结果影响并没那么显著(可以参见视频插帧论文Video Enhancement with Task-Oriented Flow)。

GitCVfb avatar Dec 05 '22 02:12 GitCVfb