tubelet-transformer The eval results from Tuber CSN-152 IG65+K400 model

Hi,

First, thanks for your work and for providing the implementation.

Following the steps you provided, I downloaded the pretrained |CSN-152 Kinetics-400+IG65M from this link you provided: TubeR_CSN152_AVA22; and after installing the same version of pytorch and other packages as you suggested and changing only the paths to the data and model in the config file: TubeR_CSN152_AVA22.yaml. I was not able to obtain the 31.1 mAP, but have only gotten 27.8 mAP (did 2 runs, same results).

I wonder if I am doing everything right and how to proceed.

Thank you.

Jan 31 '23 14:01 lemonheadboy

I am the same as you, but maybe the only difference is that I eval on a single GPU. And I get 31.137 mAP.

Feb 02 '23 01:02 cifunla

Epoch: [0][50125/50134]
data_time: 0.005, batch time: 0.083
class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093

{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5}
person AP: 0.00011
testing time 1:47:07

Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Feb 06 '23 02:02 huang-chenhai

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093

{'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07

Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Feb 07 '23 06:02 cifunla

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093 {'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07 Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Thank you very much for your answer, I have commented out these two lines, still no effect. But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes? 非常感谢你的回答，这两行我已经注释掉了，还是没有效果。但是就是分布式训练导致的问题，我用分布式训练出来的结果是正确的，不知道是哪里没有改好，我再检查检查。改成单机单卡训练，你还有改动其他地方吗？

Feb 07 '23 06:02 huang-chenhai

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093 {'PascalBoxes_Precision/[email protected]': 0.00011119179516651725, 'PascalBoxes_PerformanceByCategory/[email protected]/bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07 Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Thank you very much for your answer, I have commented out these two lines, still no effect. But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes? 非常感谢你的回答，这两行我已经注释掉了，还是没有效果。但是就是分布式训练导致的问题，我用分布式训练出来的结果是正确的，不知道是哪里没有改好，我再检查检查。改成单机单卡训练，你还有改动其他地方吗？

I haven‘t any other changes.Sorry.I don't know why you get the wrong result.

Feb 07 '23 07:02 cifunla

I tried running with 1 GPU, but the results are still the same. I also get the same drop for ava 2.1. I was wondering if maybe the issue comes from something else beside the number of GPUs.

Mar 02 '23 10:03 lemonheadboy

hello, can you train the JHMDB dataset properly?I encountered the following problem I used a pre-training dataset that worked fine during training and did not predict correct results on the validation set。 To my surprise, everything works fine when continuing training with the weights provided by the author that have already been trained（TubeR_CSN152_JHMDB.pth）.

Mar 16 '23 01:03 huang-chenhai

Hi, have you retrained this dataset of JHMDB, I can't train to get the author's result. Very much looking forward to get your reply.

天醒之路 @.***

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年2月7日(星期二) 下午3:03 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [amazon-science/tubelet-transformer] The eval results from Tuber CSN-152 IG65+K400 model (Issue #16)

Epoch: [0][50125/50134] data_time: 0.005, batch time: 0.083 class_error: 99.894, loss: 147.424, loss_bbox: 0.738, loss_giou: 0.835, loss_ce: 1.515, loss_ce_b: 1.093 @.': 0.00011119179516651725, @./bend/bow (at the waist)': 0.0001111917951665172 5} person AP: 0.00011 testing time 1:47:07 Hi, I used the single 3090， non-distributed method, above is the process of reasoning ava2.2, why is classerror, loss so high. The final reasoning result came out wrong too. Looking forward to your answer

Hi, Have you commented out line 423 and line 452 of the video_action_recognition.py?

Thank you very much for your answer, I have commented out these two lines, still no effect. But it's the distributed training that causes the problem, the result I got with distributed training is correct, I don't know where I didn't change it, I'll check it again. Change to single machine single card training, do you have any other changes? 非常感谢你的回答，这两行我已经注释掉了，还是没有效果。但是就是分布式训练导致的问题，我用分布式训练出来的结果是正确的，不知道是哪里没有改好，我再检查检查。改成单机单卡训练，你还有改动其他地方吗？

I haven‘t any other changes.Sorry.I don't know why you get the wrong result.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Apr 11 '23 03:04 huang-chenhai

tubelet-transformer tubelet-transformer copied to clipboard

The eval results from Tuber CSN-152 IG65+K400 model

tubelet-transformer
tubelet-transformer copied to clipboard