VideoMAE Why use the original mean and var of each patch when visualizing the reconstruction video?

Why use the original mean and var of each patch when visualizing the reconstruction video?

Open PeisenZhao opened this issue 2 years ago • 1 comments

When I set a high mask ratio, some unpredictable content will be roughly predicted.

Jun 28 '22 11:06 PeisenZhao

Because the target of reconstruction task is the normalized pixel instead of pixel

Jul 08 '22 09:07 congee524