OpenTAD icon indicating copy to clipboard operation
OpenTAD copied to clipboard

AdaTAD works worse than expected on IKEA ASM dataset

Open tongda opened this issue 7 months ago • 11 comments

Hi, I have tried to train a AdaTAD model on IKEA ASM dataset. I followed the THUMOS config using VideoMAE base model.

The final epoch output is:

2024-07-15 09:17:18 Train INFO: [Train]: [059][00050/00126]  Loss=0.5143  cls_loss=0.2856  reg_loss=0.2287  lr_backbone=3.9e-05  lr_det=3.9e-05  mem=4993MB
2024-07-15 09:22:19 Train INFO: [Train]: [059][00100/00126]  Loss=0.5090  cls_loss=0.2780  reg_loss=0.2310  lr_backbone=3.8e-05  lr_det=3.8e-05  mem=4993MB
2024-07-15 09:25:03 Train INFO: [Train]: [059][00126/00126]  Loss=0.5022  cls_loss=0.2728  reg_loss=0.2294  lr_backbone=3.8e-05  lr_det=3.8e-05  mem=4993MB

The evaluation result is:

2024-07-15 09:32:55 Train INFO: Evaluation starts...
2024-07-15 09:32:57 Train INFO: Loaded annotations from validation subset.
2024-07-15 09:32:57 Train INFO: Number of ground truth instances: 0
2024-07-15 09:32:57 Train INFO: Number of predictions: 234000
2024-07-15 09:32:57 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-07-15 09:32:57 Train INFO: Average-mAP:  nan (%)
2024-07-15 09:32:57 Train INFO: mAP at tIoU 0.30 is  nan%
2024-07-15 09:32:57 Train INFO: mAP at tIoU 0.40 is  nan%
2024-07-15 09:32:57 Train INFO: mAP at tIoU 0.50 is  nan%
2024-07-15 09:32:57 Train INFO: mAP at tIoU 0.60 is  nan%
2024-07-15 09:32:57 Train INFO: mAP at tIoU 0.70 is  nan%
2024-07-15 09:32:57 Train INFO: Training Over...

Using the model to infer a test video, I try to mark the actions at the bottom of the frame (first bar is GT, second bar is predicted). From the snapshot below, we can see that most of the actions are wrong. image

When processing the dataset, I remove 'NA' label. No more extra processing. Any idea about how to improve this?

tongda avatar Jul 16 '24 06:07 tongda