ig65m-pytorch icon indicating copy to clipboard operation
ig65m-pytorch copied to clipboard

Run on full Kinetics-400 dataset to verify accuracy claims

Open daniel-j-h opened this issue 6 years ago • 7 comments

We validated the ported weights and model only on subset of Kinetics-400 we had at hand.

We should run over the full Kinetics-400 dataset and verify what the folks claim in:

https://github.com/facebookresearch/vmz

daniel-j-h avatar Sep 27 '19 17:09 daniel-j-h

This is blocked by the Kinetics dataset being provided as YouTube video ids only and you have to scrape full videos to extract labeled frames. Which is a bit of a pain for 600k videos.

Tracking: https://github.com/activitynet/ActivityNet/issues/28#issuecomment-535287732

daniel-j-h avatar Sep 28 '19 08:09 daniel-j-h

Also note that a LOT of the original kinetics-400 videos are actually not existing anymore. I can try to run them for you on their snapshot in a few days (maybe weeks, depends on my workload) :)

bjuncek avatar Oct 08 '19 15:10 bjuncek

  • not sure if it's in the repo, but you'd have to implement the FCN testing scheme from the paper that I believe was used there.

bjuncek avatar Oct 08 '19 15:10 bjuncek

I can't believe how hard it is to work with the Kinetics dataset :man_facepalming:

If you have a snapshot with the extracted labeled clips, could you just shoot me a mail (check my github profile) please. It would be great to get it e.g. on a requester pays AWS S3 bucket :hugs:

I don't think asking you to run our models here every now and then is a good long term solution for us. At the same time the Kinetics situation is not a good place to be for video research in the first place.

daniel-j-h avatar Oct 10 '19 09:10 daniel-j-h

Regarding evaluation strategy. Reading

https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf

  1. Experiments

The fc-only experiments should be good enough for a first step here: extract features for a fixed model (from our PyTorch port; see the extract tool), then train a logistic regressor on top.

daniel-j-h avatar Oct 10 '19 09:10 daniel-j-h

Regarding evaluation strategy. Reading

https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf

  1. Experiments

The fc-only experiments should be good enough for a first step here: extract features for a fixed model (from our PyTorch port; see the extract tool), then train a logistic regressor on top.

Ah - my bad - I was looking at Du's CSN paper :)

bjuncek avatar Oct 10 '19 14:10 bjuncek

I wanna report my own evaluation on Kineticcs 400. I used your transferred R(2+1)D model pretrained on IG65G and finetuned on kinetics, with clips length 8 and 32 repectively. My Kinetics 400 database is not complete yet, with about 10k training samples and 240 val samples lost.

  • video clip =8: clip accuracy = 59.60%, video accuracy = 73.20%
  • video clip = 32: clip accuracy = 65.86%, video accuracy = 77.92%

I wrote the evaluation framework by myself and thus might make some difference. but the result seem normal compared to what the paper claim. It's a pity that the code might not be released since I've done it in my internship.

FesianXu avatar May 02 '20 09:05 FesianXu