mdetr icon indicating copy to clipboard operation
mdetr copied to clipboard

KeyError: 'gqa_accuracy_answer_total_unscaled'

Open TopCoder2K opened this issue 3 years ago • 3 comments

This mistake is really strange... I follow the readme for training MDETR on CLEVR. Firstly, I've ran the following command:

python run_with_submitit.py --dataset_config configs/clevr_pretrain.json --backbone "resnet18" --num_queries 25 --batch_size 64  --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step1 --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1

The only difference with the one in the readme is that I've used run_with_submitit.py and added --nodes 1 --ngpus 1 parameters. The training has gone well and the job has finished successfully. Then I've ran

python run_with_submitit.py --dataset_config configs/clevr.json --backbone "resnet18" --num_queries 25 --batch_size 64  --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step2 --load ~/MDETR/mdetr/checkpoint/pchelintsev/experiments/19906/BEST_checkpoint.pth --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1

And after the first epoch and testing I've gotten the following in 28574_0_log.err file (warnings were deleted):

submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception
Traceback (most recent call last):
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 71, in submitit_main
    process_job(args.folder)
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 64, in process_job
    raise error
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 53, in process_job
    result = delayed.result()
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/utils.py", line 128, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "run_with_submitit.py", line 98, in __call__
    detection.main(self.args)
  File "/home/pchelintsev/MDETR/mdetr/main.py", line 614, in main
    metric = test_stats["gqa_accuracy_answer_total_unscaled"]
KeyError: 'gqa_accuracy_answer_total_unscaled'

Why the loss is missing?(( Also, here is the end of 28574_0_log.out file:

Accumulating evaluation results...
DONE (t=70.57s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.581
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.893
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.660
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.374
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.578
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.302
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.729
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.741
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.637
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.741
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.842
submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception

TopCoder2K avatar Sep 27 '21 15:09 TopCoder2K

I noticed a strange moment in main.py. Likely, this is not the reason, but this might help in searching the mistake. This is the line: image But here we can see that in case of CLEVR updating works only for 'clevr_something' keys (because in the config file we've put only ["clevr"]) image So gqa_accuracy_answer_total_unscaled cannot emerge...

TopCoder2K avatar Sep 27 '21 16:09 TopCoder2K

@TopCoder2K Did you manage to find a solution to this?

tchiwewe avatar Jul 17 '24 14:07 tchiwewe

Not sure why this is hardcoded. When doing QA, the list of keys (metrics) available when using the CLEVR dataset is as shown below. Changing 'gqa_accuracy_answer_total_unscaled' to 'clevr_accuracy_answer_total_unscaled' in the code should fix the problem.

dict_keys([
 'clevr_loss',
 'clevr_loss_ce',
 'clevr_loss_bbox',
 'clevr_loss_giou',
 'clevr_loss_contrastive_align',
 'clevr_loss_ce_0',
 'clevr_loss_bbox_0',
 'clevr_loss_giou_0',
 'clevr_loss_contrastive_align_0',
 'clevr_loss_ce_1',
 'clevr_loss_bbox_1',
 'clevr_loss_giou_1',
 'clevr_loss_contrastive_align_1',
 'clevr_loss_ce_2',
 'clevr_loss_bbox_2',
 'clevr_loss_giou_2',
 'clevr_loss_contrastive_align_2',
 'clevr_loss_ce_3',
 'clevr_loss_bbox_3',
 'clevr_loss_giou_3',
 'clevr_loss_contrastive_align_3',
 'clevr_loss_ce_4',
 'clevr_loss_bbox_4',
 'clevr_loss_giou_4',
 'clevr_loss_contrastive_align_4',
 'clevr_loss_answer_type',
 'clevr_loss_answer_binary',
 'clevr_loss_answer_reg',
 'clevr_loss_answer_attr',
 'clevr_loss_ce_unscaled',
 'clevr_loss_bbox_unscaled',
 'clevr_loss_giou_unscaled',
 'clevr_cardinality_error_unscaled',
 'clevr_loss_contrastive_align_unscaled',
 'clevr_loss_ce_0_unscaled',
 'clevr_loss_bbox_0_unscaled',
 'clevr_loss_giou_0_unscaled',
 'clevr_cardinality_error_0_unscaled',
 'clevr_loss_contrastive_align_0_unscaled',
 'clevr_loss_ce_1_unscaled',
 'clevr_loss_bbox_1_unscaled',
 'clevr_loss_giou_1_unscaled',
 'clevr_cardinality_error_1_unscaled',
 'clevr_loss_contrastive_align_1_unscaled',
 'clevr_loss_ce_2_unscaled',
 'clevr_loss_bbox_2_unscaled',
 'clevr_loss_giou_2_unscaled',
 'clevr_cardinality_error_2_unscaled',
 'clevr_loss_contrastive_align_2_unscaled',
 'clevr_loss_ce_3_unscaled',
 'clevr_loss_bbox_3_unscaled',
 'clevr_loss_giou_3_unscaled',
 'clevr_cardinality_error_3_unscaled',
 'clevr_loss_contrastive_align_3_unscaled',
 'clevr_loss_ce_4_unscaled',
 'clevr_loss_bbox_4_unscaled',
 'clevr_loss_giou_4_unscaled',
 'clevr_cardinality_error_4_unscaled',
 'clevr_loss_contrastive_align_4_unscaled',
 'clevr_loss_answer_type_unscaled',
 'clevr_accuracy_answer_type_unscaled',
 'clevr_loss_answer_binary_unscaled',
 'clevr_accuracy_answer_binary_unscaled',
 'clevr_loss_answer_reg_unscaled',
 'clevr_accuracy_answer_reg_unscaled',
 'clevr_loss_answer_attr_unscaled',
 'clevr_accuracy_answer_attr_unscaled',
 'clevr_accuracy_answer_total_unscaled',
 'clevr_coco_eval_bbox'
])

tchiwewe avatar Jul 17 '24 19:07 tchiwewe