Xin Ma
Xin Ma
@Eason-Great Yes, the prediction is the hog features. As for the hog visulization, we follow the source code in the scikit-image https://github.com/scikit-image/scikit-image/blob/v0.19.0/skimage/feature/_hog.py#L48-L307
@yanmingchao2 you can follow this guide to prepare the kinetics dataset https://github.com/open-mmlab/mmaction2/blob/master/tools/data/kinetics/README.md
@daidaiershidi We have no exact score on Kinetics but have pretrained on ImageNet1k, reaching a similar score as the original paper (ViT-B: 84.0, ViT-L: 85.9).
@daidaiershidi It is belongs to a part of a project. We will release the code in the next few weeks.
@realgump you can check this link https://github.com/rwightman/pytorch-image-models/blob/7430a85d07a7f335e18c2225fda4a5e7b60b995c/timm/models/vision_transformer.py#L52.
@asif-hanif You can check this link https://github.com/asyml/vision-transformer-pytorch/blob/2d8828948e7ab122f5db11fd67cb7b46c6bb6823/src/checkpoint.py#L80.
@nullhty There are two parts of model structure, the first one is a spatial-only transformer and the last one is a temporal-only transformer.
@SuperGentry We use decord to extract the video frames. And the details about how to read the frames from the video, you can check its official site https://github.com/dmlc/decord. After loading...
> Hi. Thank you for your work. I have a problem because I don't really know how to make use of the files you uploaded to this repository. I would...
> Is there any pretrained model on Kinetics or Something Something v2 or EPIC-KITCHENS-100 dataset? if you want to reach the pre-train weights on Kinetics600, we offer the pre-train models...