AR-Net icon indicating copy to clipboard operation
AR-Net copied to clipboard

The question on the curve of Figure 3

Open liujiaheng opened this issue 3 years ago • 14 comments

Hi, this is a nice paper. How to generate different AR-Net models with different flops(i.e., the red curve in Fig 3)? Do you achieve this by selecting different loss weights for Flops constraint loss?

Thanks. Looking for your kind reply.

liujiaheng avatar Sep 28 '20 14:09 liujiaheng

Hi, Jiaheng. I think what you suggested makes sense. However, we just used different "#frames at inference stage" to plot the curve. For example, if AR-Net is trained using t=16 frames, then we just use t'=4,8,16,25,32 at inference stage to generate the mAP/flops curve (in the original experiments, it may not be those exact #frames). In this way, we just need to do training once, and then run the inference multiple times (which is much faster).

mengyuest avatar Sep 28 '20 18:09 mengyuest

Hi, thanks for your kind reply. This is a nice work, and I am following your work for other video understanding task. I still have one question. The inference strategy is sequential(due to LSTM and generating a policy for each frame). But the inference of TSN/TSM can be parallel(We send all selected frames (i.e. 8 or 16 frames) to the network once.) So I think the inference time can be faster than this sequential way. Can you offer some explanations for this phenomenon?

liujiaheng avatar Oct 26 '20 06:10 liujiaheng

There is an error. The TSN can be parallel, but the TSM can not be parallel(due to shift operation.)

liujiaheng avatar Oct 26 '20 06:10 liujiaheng

If AR-Net doesn't choose to skip a lot of frames, then I guess you are right. One way to speed up the AR-Net inference stage is to first do an entire LSTM run only for the PolicyNet, then route different resolutions to different backbones and handle them simultaneously (more as an engineering work but doable). In this way if the PolicyNet can run very fast (as the network is light, and the frame resolution is very low), then approximately the backbones are handling the frames in parallel.

Our main focus is not to directly optimize the inference time, which though I agree is very important in real applications. Looking forward to see new papers to tackle this area.

mengyuest avatar Oct 28 '20 04:10 mengyuest

Thanks.

liujiaheng avatar Oct 28 '20 12:10 liujiaheng

Hi, I have a question about the details of the ablation study on "skipping" in table 4. You mentioned that you only use the policy network to decide how many frames to skip. But I am not sure if you use another feature extraction network for each frame. In detail, the input of the policy network is the lowest resolution with the smallest network. If one frame is not skipped, should I need to use another network to extract the feature of the current frame again for final action class prediction? Is the feature extraction network fixed for each frame? If you don't need another network, do you mean that Arnet uses features extracted from the lowest resolution with the smallest network for final action class prediction when the frames are not skipped?

liujiaheng avatar Oct 28 '20 15:10 liujiaheng

We used the features extracted from the lowest resolution for both the policy and the video action prediction (if needed). We just devise two prediction heads after those extracted features (for the policy part, we further use an LSTM to incorporate temporal info) to fulfill these two tasks. More details can be found in the code.

mengyuest avatar Oct 28 '20 18:10 mengyuest

Hello teacher, I have read your AR-Net article and benefited a lot. But now I have some problems. I can't find the ActivityNet, FCVID and mini-Kinetics datasets. Is it convenient for you to share it? Thank you, teacher!

MingZier avatar Nov 01 '20 05:11 MingZier

您好,谢谢您的回覆。 这是一项不错的工作,我正在关注您的工作,以完成其他视频理解任务。 我还有一个问题。 推理策略是顺序的(由于LSTM并为每个帧生成策略)。 但是TSN / TSM的推理可以是并行的(我们将所有选择的帧(即8或16帧)发送到网络一次。) 因此,我认为推理时间可能比这种顺序方式要快。 您能为这种现象提供一些解释吗? Hello teacher, I can't find the ActivityNet, FCVID and mini-Kinetics datasets. Is it convenient for you to share it? Thank you, teacher!

MingZier avatar Nov 01 '20 05:11 MingZier

Hi Ming, thanks for your interest.

ActivityNet-v1.3 is at http://activity-net.org/download.html FCVID is at http://bigvid.fudan.edu.cn/FCVID/ (but it might be not available now) Kinetics is at https://deepmind.com/research/open-source/kinetics (K400; for mini-kinetics used in our paper, you just need to use the train/val splits from )

Hope this helps~

mengyuest avatar Nov 02 '20 04:11 mengyuest

        Thank you, teacher!  Is it convenient for teachers to share links to Baidu Cloud Disk?  I cannot download these three datasets.  Thank you very much teacher!



        
        
            
                
                        
                            
                        
                        
                            824318891
                        
                
                    
                        
                                邮箱:[email protected]
                        
                    
            
        
    Signature is customized by Netease Mail Master


        On 11/02/2020 12:11, Yue Meng wrote: 

Hi Ming, thanks for your interest. ActivityNet-v1.3 is at http://activity-net.org/download.html FCVID is at http://bigvid.fudan.edu.cn/FCVID/ (but it might be not available now) Kinetics is at https://deepmind.com/research/open-source/kinetics (K400; for mini-kinetics used in our paper, you just need to use the train/val splits from ) Hope this helps~

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/mengyuest/AR-Net/issues/1#issuecomment-720225781", "url": "https://github.com/mengyuest/AR-Net/issues/1#issuecomment-720225781", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

MingZier avatar Nov 02 '20 04:11 MingZier

I am afraid it is inconvenience for me to share it via BaiduPan or GoogleDrive - those training sets are too big to be transferred in this way. Maybe you can check some other websites/blogs like https://blog.csdn.net/qq_41590635/article/details/103781908 to see whether they can give you access to those datasets (Haven't tried this before. Hope this helps)

mengyuest avatar Nov 02 '20 04:11 mengyuest

Thank you teacher, is it convenient for you to provide the FCVID dataset?  I can't find it online.  Bringing trouble to the teacher!

                            824318891
                        
                
                    
                        
                                邮箱:[email protected]
                        
                    
            
        
    Signature is customized by Netease Mail Master

On 11/02/2020 12:23, Yue Meng wrote: I am afraid it is inconvenience for me to share it via BaiduPan or GoogleDrive - those training sets are too big to be transferred in this way. Maybe you can check some other websites/blogs like https://blog.csdn.net/qq_41590635/article/details/103781908 to see whether they can give you access to those datasets (Haven't tried this before. Hope this helps)

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/mengyuest/AR-Net/issues/1#issuecomment-720228231", "url": "https://github.com/mengyuest/AR-Net/issues/1#issuecomment-720228231", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

MingZier avatar Nov 02 '20 04:11 MingZier

Sorry I don't have those by my side. This work was done during my internship on company servers, and now I am back to campus. I suggest you to start from ActivityNet-v1.3, because it is smaller than FCVID and takes less time for training/testing.

mengyuest avatar Nov 02 '20 05:11 mengyuest