ViViT-pytorch icon indicating copy to clipboard operation
ViViT-pytorch copied to clipboard

ViViT for Regression Tasks

Open Taimoor-R opened this issue 2 years ago • 1 comments

I have been trying to use TimeSformer and ViViT, I have managed to convert it into a regression model by changing the loss function and setting the output of the mlp to 1. However what i understand is that a video vision transformer takes a video clip as an input(broken into frames) and outputs a single value corresponding to that input clip. I would like the model to output a value for each frame of the clip input so instead of outputing 1 value it outputs 32 values. Can you guide me in this regards.

Taimoor-R avatar Feb 01 '23 23:02 Taimoor-R

Hi @Taimoor-R I also have interest in developing a model that performs this function and am also in the process of figuring out how to adjust the model to predict values for each pixel (or said regression). Have you found a solution in this direction? Thanks for any hint.

BitCalSaul avatar Dec 23 '23 03:12 BitCalSaul