mmaction2 Roadmap of MMAction2

Roadmap of MMAction2

Open hellock opened this issue 3 years ago • 38 comments

We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.

You can either:

Suggest a new feature by leaving a comment.
Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)
Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)

Jul 13 '20 13:07 hellock

I suppose it would be interesting to add CSN and X3D by FAIR into the supported model family. I also have an interest in helping implement/review them if time permits.

Jul 14 '20 11:07 d-li14

I suppose it would be interesting to add CSN and X3D by FAIR into the supported model family. I also have an interest in helping implement/review them if time permits.

CSN is in the plan of next release. It would be great if you would like to help with the implementation of X3D.

Jul 14 '20 11:07 hellock

I strongly recommend adding the support for dataset FineGym99 with video dataset_type, it would be more convenient for users to validate the ideas for fine-grained action recognition or localization tasks. Hoping this would come true in a not so long future!

Jul 15 '20 09:07 Amazingren

it will be nice if mmaction2 could support ava dataset and spatio-temporal action detection models.

Jul 16 '20 14:07 irvingzhang0512

it will be nice if mmaction2 can give some pretrained backbone models for user,, such as ResNet3dSlowFast and so on.

Jul 20 '20 07:07 q5390498

it will be nice if mmaction2 could support ava dataset and spatio-temporal action detection models.

Yes it is in the plan.

Jul 21 '20 12:07 hellock

it will be nice if mmaction2 can give some pretrained backbone models for user,, such as ResNet3dSlowFast and so on.

There are already lots of pretrained models in the model zoo.

Jul 21 '20 12:07 hellock

It will be better if the model can output in video format such as mp4. I have tired the demo.py, it feedbacks text.

Jul 27 '20 02:07 IDayday

It now supports to output video format and gif format in demo.py.

Aug 03 '20 05:08 dreamerlin

@dreamerlin could you pls sort out all feature requests in one grand post here, so that we can easily track the status? 🏃

Aug 03 '20 05:08 innerlee

Introducing Multi-Grid or mixed precision training strategy would be helpful for faster prototype iteration.

Aug 08 '20 06:08 tianyuan168326

In the action localization task，you provided the code to get the AUC metric for action proposal evaluation. Could you also provide the classification results to get the mAP?

Aug 08 '20 12:08 JJBOY

It can be used to recognize real-time videos with webcamera or something else? thanks

Aug 31 '20 13:08 IDayday

There are many trained models in Model Zoo, while all of them are just used to test the performance of the proposed works. Do you plan to make them available for backbone pre-training? Say I may want to use the i3d pre-trained on kinetics-400 as the pre-trained backbone of my own model. It seems that we don't have much choice of pre-trained backbones except a Resnet50 on ImageNet.

Oct 01 '20 14:10 makecent

There are many trained models in Model Zoo, while all of them are just used to test the performance of the proposed works. Do you plan to make them available for backbone pre-training? Say I may want to use the i3d pre-trained on kinetics-400 as the pre-trained backbone of my own model. It seems that we don't have much choice of pre-trained backbones except a Resnet50 on ImageNet.

To use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the load_from. See Tutorial 1: Finetuning Models # Use Pre-Trained Model and example. And to use backbone pre-training, you can change pretrained value in the backbone dict, The unexpected keys will be ignored.

Oct 01 '20 14:10 dreamerlin

There are many trained models in Model Zoo, while all of them are just used to test the performance of the proposed works. Do you plan to make them available for backbone pre-training? Say I may want to use the i3d pre-trained on kinetics-400 as the pre-trained backbone of my own model. It seems that we don't have much choice of pre-trained backbones except a Resnet50 on ImageNet.

To use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the load_from. See Tutorial 1: Finetuning Models # Use Pre-Trained Model and example. And to use backbone pre-training, you can change pretrained value in the backbone dict, The unexpected keys will be ignored.

Wow! Fantastic! I think you can mention this feature somewhere in case others, like me, may don't know that they directly use pre-trained weights of the whole model for the backbone.

Oct 02 '20 07:10 makecent

Could you please support X3D

Nov 01 '20 09:11 vikizhao156

Could you please support X3D

Here is the X3D config files. https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/x3d

Nov 03 '20 06:11 dreamerlin

Could you please add Video Action/Activity Temporal Segmentation models?

Nov 21 '20 22:11 ahkarami

Also, could you please add Video models on MovieNet data set?

Nov 21 '20 22:11 ahkarami

Hi, I'm struggling to train a model using a dataset structured like the AVA dataset. Does anyone have a config file that they have used for this type of dataset that they would be willing to share? There is a code to create an ava dataset, but I haven't been able to find any config files. Otherwise, is there a different framework I can train where I have bounding boxes in the training data? Thank you

Dec 11 '20 23:12 mikeyEcology

Recently I learned about action localization/detection/segmentation(They seem to be the same thing ), it seems that it can generate a file like caption, i found it very interesting and practical. I will be very apreciate it if mmaction2 could have the action localization demo and more docs about it, thanks !

Dec 14 '20 10:12 wwdok

Very happy to have spatio-temporal action detection model today... Two related features could be very helpful:

spatio-temporal action detection online/video demo.
train spatio-temporal action detection models with custom categories.(eg. choose sit/stand/lie, ignore all other categories)

Dec 18 '20 10:12 irvingzhang0512

Do you have a plan to add flow models for TSN and I3D?

Dec 22 '20 21:12 F9393

How about adding some models for temporal action segmentation?

Jan 06 '21 09:01 jin-s13

Thanks for the great repo! Do you have plans adding S3D and S3D-G from https://arxiv.org/abs/1712.04851? They achieve better performance than the I3D model while runs much faster. Here is a reproduced implementation of the S3D model: https://github.com/kylemin/S3D. And for S3D-G model https://github.com/antoine77340/S3D_HowTo100M/blob/master/s3dg.py, https://github.com/tensorflow/models/blob/master/research/slim/nets/s3dg.py

Jan 15 '21 02:01 jayleicn

Thanks in advance for this great unceasing progressing repo.

Recently, I saw that on ava-kinetics challenge, the new method 'Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization' has a very good performance and take the lead of nearly 6 percent to the second place in the competition 2020. And I think is a good candidate to enrich the area of spatio temporal action localization in mmaction2.

Will you consider to include this network? I have also open a request on #641

Feb 24 '21 03:02 sijun-zhou

Could you please add the algorithm proposed in the paper of AVA dataset [1]. It is helpful for comparing experiment for spatio-temporal action localization when using AVA dataset. The model is consist of Faster-Rcnn and I3D.

Reference: [1] Gu C, Sun C, Ross D A, et al. Ava: A video dataset of spatio-temporally localized atomic visual actions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6047-6056.