aot-benchmark icon indicating copy to clipboard operation
aot-benchmark copied to clipboard

0.5px/1px E/SE misaligment

Open bhack opened this issue 1 year ago • 10 comments

Have you found on your experiment runs a 0.5px/1px misaligment bias in right/bottom-right direction? I have noted this both with aligned and not aligned corners models that you have used (e.g. R50/Swin Deaotl). As these kind of errors are very hard to debug I want to know if you have experienced something like this on your side.

Thanks.

bhack avatar May 13 '23 11:05 bhack

Hi, thank you for pointing out the issue. Could you please give an image for example? where does the 0.5px/1px misaligment bias happen?

yoxu515 avatar May 19 '23 11:05 yoxu515

I will try to find an example on DAVIS to share. In the meantime have you already experienced something like this?

bhack avatar May 19 '23 16:05 bhack

@yoxu515 @z-x-yang Just to give more evidence to this effect I've replicated the same frame (0) from DAVIS speed-skating 100 times.

Here the original annotation (frame 0/GT): 00000

Frame 0 after 50 propagation: 00050

Frame 0 after 99 propagation: 00099

Down/Right accumulated Drift 0-10: diff10

Down/Right accumul diff50 ated Drift 0-50:

Down/Right accumulated Drift 0-99: diff99

bhack avatar Jun 07 '23 19:06 bhack

Any feedback on this?

bhack avatar Jul 02 '23 15:07 bhack

@yoxu515 There is a rounding error in the eval engine and the related interpolations as this happen and it is reproducible when input W or H is not divisible by 16.

bhack avatar Jan 25 '24 11:01 bhack

@z-x-yang Other then some edge case precision about the max_stride alignment you are also affected by https://github.com/pytorch/pytorch/issues/34808.

s-deeper avatar Mar 02 '24 11:03 s-deeper

@yoxu515 There is a rounding error in the eval engine and the related interpolations as this happen and it is reproducible when input W or H is not divisible by 16.

Thank you for your ongoing attention to this issue. To be honest, at this point, I am also still trying to understand why this misalignment is occurring. Perhaps, as @s-deeper commented, the nearest interpolation in PyTorch could introduce misalignment, leading to this situation. However, AOT should be able to learn how to eliminate this misalignment during training, unless there is a lack of strict alignment between the training and testing settings.

As far as I remember, the handling of mask interpolation in both the training and testing processes of AOT should be consistent. In any case, I will pay closer attention to this issue. Thank you!

z-x-yang avatar Mar 02 '24 12:03 z-x-yang

@yoxu515 There is a rounding error in the eval engine and the related interpolations as this happen and it is reproducible when input W or H is not divisible by 16.

Furthermore, has this kind of misalignment caused any difficulties for you in the actual use of DeAOT? If not, I don't believe it's a critical issue.

Indeed, I have also noticed in some early versions of DeAOT experiments that when the video frame rate is very high, and the target remains stationary, there is some weird drift in the segmentation mask. However, in the released versions of DeAOT on YouTube-VOS and DAVIS, this issue does not arise (though I am not sure if it persists in videos with even higher frame rates or smaller object movements).

z-x-yang avatar Mar 02 '24 12:03 z-x-yang

@z-x-yang Other then some edge case precision about the max_stride alignment you are also affected by pytorch/pytorch#34808.

Thank you for pointing out the bug in PyTorch that I had not previously noticed! I will review the relevant code and strive to prevent all unexpected misalignments.

z-x-yang avatar Mar 02 '24 13:03 z-x-yang

You can reproduce exactly with the current eval code:

  • Using Not divisible by 16 input
  • Using a repeated still image (bit perfect) to reproduce the drift.

Partially it is solved for sure by https://github.com/pytorch/pytorch/issues/34808#issuecomment-1007806783

But I think you have residual edge case/side effect using np.around: https://github.com/yoxu515/aot-benchmark/blob/ada8a3cbf0ba6dde563a49e78e56dbbcde01d143/dataloaders/video_transforms.py#L640-L655

Can you increase the precision there?

Also there is another issue in training: https://github.com/pytorch/pytorch/issues/104157

You need to use something like: https://github.com/huggingface/transformers/pull/28504/files#r1455033425

bhack avatar Mar 02 '24 13:03 bhack