ViViT-pytorch ViViT for Regression Tasks

ViViT for Regression Tasks

Open Taimoor-R opened this issue 2 years ago • 1 comments

I have been trying to use TimeSformer and ViViT, I have managed to convert it into a regression model by changing the loss function and setting the output of the mlp to 1. However what i understand is that a video vision transformer takes a video clip as an input(broken into frames) and outputs a single value corresponding to that input clip. I would like the model to output a value for each frame of the clip input so instead of outputing 1 value it outputs 32 values. Can you guide me in this regards.

Feb 01 '23 23:02 Taimoor-R

Hi @Taimoor-R I also have interest in developing a model that performs this function and am also in the process of figuring out how to adjust the model to predict values for each pixel (or said regression). Have you found a solution in this direction? Thanks for any hint.

Dec 23 '23 03:12 BitCalSaul

ViViT-pytorch ViViT-pytorch copied to clipboard

ViViT for Regression Tasks

ViViT-pytorch
ViViT-pytorch copied to clipboard