Xin Ma comments

Results 11 comments of


                                            Xin Ma

HOG visualization

@Eason-Great Yes, the prediction is the hog features. As for the hog visulization, we follow the source code in the scikit-image https://github.com/scikit-image/scikit-image/blob/v0.19.0/skimage/feature/_hog.py#L48-L307

How to place kinetics400 dataset？

@yanmingchao2 you can follow this guide to prepare the kinetics dataset https://github.com/open-mmlab/mmaction2/blob/master/tools/data/kinetics/README.md

What is the final score of maskfeat?

@daidaiershidi We have no exact score on Kinetics but have pretrained on ImageNet1k, reaching a similar score as the original paper (ViT-B: 84.0, ViT-L: 85.9).

What is the final score of maskfeat?

@daidaiershidi It is belongs to a part of a project. We will release the code in the next few weeks.

How to load Tensorflow checkpoints?

@realgump you can check this link https://github.com/rwightman/pytorch-image-models/blob/7430a85d07a7f335e18c2225fda4a5e7b60b995c/timm/models/vision_transformer.py#L52.

How to load Tensorflow checkpoints?

@asif-hanif You can check this link https://github.com/asyml/vision-transformer-pytorch/blob/2d8828948e7ab122f5db11fd67cb7b46c6bb6823/src/checkpoint.py#L80.

structure of ViViT-b

@nullhty There are two parts of model structure, the first one is a spatial-only transformer and the last one is a temporal-only transformer.

How to dataloader?

@SuperGentry We use decord to extract the video frames. And the details about how to read the frames from the video, you can check its official site https://github.com/dmlc/decord. After loading...

Codes to train and test

> Hi. Thank you for your work. I have a problem because I don't really know how to make use of the files you uploaded to this repository. I would...

Pretrained model

> Is there any pretrained model on Kinetics or Something Something v2 or EPIC-KITCHENS-100 dataset? if you want to reach the pre-train weights on Kinetics600, we offer the pre-train models...