ViViT-pytorch icon indicating copy to clipboard operation
ViViT-pytorch copied to clipboard

Evaluation results

Open ed-fish opened this issue 3 years ago • 13 comments

Hi,

Thanks for your work making a Pytorch version of the paper - much appreciated!

How does this implementation compare to results in the original paper. Specifically on the Moments in Time dataset.

Thanks,

Ed

ed-fish avatar May 04 '21 09:05 ed-fish

I am also interested at this topic. If there is anyone could provide me more information about the model parameters that might help me fix the problem, I would be thankful for that because using the default parameters is always overfitting.

Thanks. Marco

marco-hmc avatar May 28 '21 03:05 marco-hmc

I run the model on a very small dataset(51 classes with 20 video clips per class) and the result is very strange. Always output the same prediction. I wonder if it can be better if I load the pre-trained weights. I would appreciate anyone who can give me some tips.

Thanks, Dylan

Linwei94 avatar May 30 '21 07:05 Linwei94

I run the model on a very small dataset(51 classes with 20 video clips per class) and the result is very strange. Always output the same prediction. I wonder if it can be better if I load the pre-trained weights. I would appreciate anyone who can give me some tips.

Thanks, Dylan

I am facing with the same situation with you. I still don't have any ideat about that now. Waiting for the reply for authors.

marco-hmc avatar May 30 '21 07:05 marco-hmc

I run the model on a very small dataset(51 classes with 20 video clips per class) and the result is very strange. Always output the same prediction. I wonder if it can be better if I load the pre-trained weights. I would appreciate anyone who can give me some tips. Thanks, Dylan

I am facing with the same situation with you. I still don't have any ideat about that now. Waiting for the reply for authors.

I wonder whether the problem results from the code or from my too-small dataset.

Linwei94 avatar May 30 '21 07:05 Linwei94

I tried it with a nearly 2000 Videos. Run different Epochs. But still the accuracy is not more than 21.09%. Strange thing is that it's same for most of the runs. No change in figures.

vaibhavsah avatar Aug 09 '21 09:08 vaibhavsah

I tried it with a nearly 2000 Videos. Run different Epochs. But still the accuracy is not more than 21.09%. Strange thing is that it's same for most of the runs. No change in figures.

Your dataset is too small. You can try run ViViT with ViT’s weight loaded for both temporal and spatial part.

Linwei94 avatar Aug 10 '21 01:08 Linwei94

@DylanTao94 Can you share how I can do that.

vaibhavsah avatar Aug 11 '21 09:08 vaibhavsah

Sry mate, my code is not allowed to share. You can follow the steps in ViViT paper.

Linwei94 avatar Aug 11 '21 09:08 Linwei94

yes this model works fine i've tested it on a dataset of 50k videos

seandatasci avatar Aug 14 '21 19:08 seandatasci

@seandatasci i think I might be doing something wrong with the code. Can you help me out here. My code is here

vaibhavsah avatar Aug 15 '21 10:08 vaibhavsah

@seandatasci i think I might be doing something wrong with the code. Can you help me out here. My code is here

i have the same problems with you, and i wonder you have resolved the problems whether or not, the acc or auc results is lower than 50%, the dataset size is also 2000, Thank u

Mark-Dou avatar Aug 25 '21 01:08 Mark-Dou

Inspired from the implementation of the ViViT by the author, we have reimplement the TimeSformer and ViViT, and release the pretrain-model weights on Kinetics600 can be found here

mx-mark avatar Dec 11 '21 14:12 mx-mark

The model isn't learning. Trained on 2 classes of UCF101 dataset. Adam optimizer, CrossEntropyLoss

mnauf avatar Jun 08 '23 03:06 mnauf