InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

The issue of Temporal-Action-Localization

Open typ1012 opened this issue 3 years ago • 11 comments

Hello, thanks for your good work! I encountered some problems when reproducing the performance of Temporal-Action-Localization task: Thumos14: 69.11 average mAP (lower than 71.58). The input_dim of feature is not right. (The channel dimensionof provided mae feature is 1280 but 1408 in thumos.yaml) Anet1.3: 38.56 average mAP.

typ1012 avatar Jan 03 '23 11:01 typ1012

Hi, the features we release are from the VideoMAE, and the results you reproduce are correct. Concatenating with UniformerV2 features will get a better performance shown in the paper, and the features from UniformerV2 will be released soon.

Richard-61 avatar Jan 09 '23 01:01 Richard-61

Hi @typ1012 @Richard-61 I tried to run Thumos14, but the map is only 47 (I merely changed the input_dim from 1408 to 1280). I'm wondering whether you have other modifications in order to reproduce the 69.11 mAP.

tensorboy avatar Feb 02 '23 17:02 tensorboy

Hello, thanks for your good work! I encountered some problems when reproducing the performance of Temporal-Action-Localization task: Thumos14: 69.11 average mAP (lower than 71.58). The input_dim of feature is not right. (The channel dimensionof provided mae feature is 1280 but 1408 in thumos.yaml) Anet1.3: 38.56 average mAP.

Hi,bro,I also want to know how do you reproduce the 69.11mAP,I only could get bad show

Value-Jack avatar Apr 13 '23 09:04 Value-Jack

@tensorboy @typ1012 have you solve your problem?

Value-Jack avatar Apr 13 '23 09:04 Value-Jack

@tensorboy @typ1012 have you solve your problem?

I fixed the code for the bug of batch_nms.

Richard-61 avatar Apr 13 '23 12:04 Richard-61

so? do you reproduced the result 71.58?

Value-Jack avatar Apr 13 '23 12:04 Value-Jack

from petrel_client.client import Client ModuleNotFoundError: No module named 'petrel_client',could you please tell me how to import this module?

Value-Jack avatar Apr 19 '23 14:04 Value-Jack

That is a module to load videos on our servers. It may not be applicable in your case. You can remove it and update the corresponding video loading functions. We will fix it soon. @Richard-61

shepnerd avatar Apr 19 '23 14:04 shepnerd

could you please explain the code in /InternVideo-main/Downstream/Temporal-Action-Localization/configs/thumos.yaml

I don't know the meaning of the 1408 input_dim: 1408, #2048+768, 1024+768,1280+1024

Value-Jack avatar Apr 20 '23 02:04 Value-Jack

Hello, thanks for your good work! I encountered some problems when reproducing the performance of Temporal-Action-Localization task: Thumos14: 69.11 average mAP (lower than 71.58). The input_dim of feature is not right. (The channel dimensionof provided mae feature is 1280 but 1408 in thumos.yaml) Anet1.3: 38.56 average mAP.

Hello @typ1012, I have been trying to reproduce the Anet1.3 scores as you mentioned you did, but have not been able to get better than 32.23. I have to use my own clone of the ActionFormer repository to accomplish this, as the InternVideo's downstream copy of the repository has many issues. Could you share the steps you used to produce this reported Anet1.3 score? Thank you!

christian-matroid avatar Jun 30 '23 08:06 christian-matroid