trackformer Only detection without Deformable issue

Hello, I tried running the training code without tracking and only using detection.

"program": "trackformer/src/train.py",
"args": [
            "with", 
            "crowdhuman", 
        ],

Unfortunately while testing all the results are zeros. I made no modifications to the code.

Does DETR(without tracking) not work on this repo?

Nov 15 '21 18:11 athus1990

The code is able to train and test detection results. I suggest running it with the deformable option as well. Is the loss increasing when you train or are you only testing? In case of the latter, what model are you loading?

Nov 15 '21 18:11 timmeinhardt

Hi @timmeinhardt , I am actually training it and it also evaluates the test set after each epoch. Without deformable option the losses look like this:

Epoch: [50]  [5650/5891]  eta: 0:02:44  lr: 0.000010  class_error: 6.25  loss: 3.1751 (3.3304)  loss_ce: 0.6383 (0.6623)  loss_bbox: 1.1502 (1.2152)  loss_giou: 1.4410 (1.4529)  loss_ce_unscaled: 0.6
383 (0.6623)  class_error_unscaled: 0.0000 (1.0835)  loss_bbox_unscaled: 0.2300 (0.2430)  loss_giou_unscaled: 0.7205 (0.7265)  cardinality_error_unscaled: 83.3333 (83.4961)  lr_backbone: 0.0000 (0.00
00)  time: 0.6971  data: 0.0237  max mem: 38117

Epoch: [50]  [5700/5891]  eta: 0:02:10  lr: 0.000010  class_error: 0.97  loss: 3.1855 (3.3305)  loss_ce: 0.6647 (0.6623)  loss_bbox: 1.1670 (1.2155)  loss_giou: 1.4359 (1.4527)  loss_ce_unscaled: 0.6
647 (0.6623)  class_error_unscaled: 0.0000 (1.0823)  loss_bbox_unscaled: 0.2334 (0.2431)  loss_giou_unscaled: 0.7180 (0.7263)  cardinality_error_unscaled: 85.6667 (83.4972)  lr_backbone: 0.0000 (0.00
00)  time: 0.6949  data: 0.0242  max mem: 38117

Epoch: [50]  [5750/5891]  eta: 0:01:36  lr: 0.000010  class_error: 0.00  loss: 3.1325 (3.3309)  loss_ce: 0.6471 (0.6623)  loss_bbox: 1.1386 (1.2159)  loss_giou: 1.3137 (1.4527)  loss_ce_unscaled: 0.6
471 (0.6623)  class_error_unscaled: 1.9231 (1.0890)  loss_bbox_unscaled: 0.2277 (0.2432)  loss_giou_unscaled: 0.6568 (0.7264)  cardinality_error_unscaled: 82.3333 (83.4968)  lr_backbone: 0.0000 (0.00
00)  time: 0.6647  data: 0.0239  max mem: 38117

Epoch: [50]  [5800/5891]  eta: 0:01:02  lr: 0.000010  class_error: 0.00  loss: 3.1656 (3.3307)  loss_ce: 0.6540 (0.6624)  loss_bbox: 1.0964 (1.2160)  loss_giou: 1.4227 (1.4524)  loss_ce_unscaled: 0.6
540 (0.6624)  class_error_unscaled: 0.0000 (1.0852)  loss_bbox_unscaled: 0.2193 (0.2432)  loss_giou_unscaled: 0.7114 (0.7262)  cardinality_error_unscaled: 84.3333 (83.5059)  lr_backbone: 0.0000 (0.00....

Overall losses dont seem to be changing much. I will try with deformable option and let you know asap.

Nov 15 '21 18:11 athus1990

You are preloading the resume: models/r50_deformable_detr-checkpoint.pth model. But without deformable option. Please make yourself familiar with the configurations and ideally start with one of the example commands and change it to your needs.

Nov 15 '21 18:11 timmeinhardt

Hi @timmeinhardt , No I actually set args.resume=False before training. I am training from scratch.

Made sure it does not go inside, this section of the code

if args.resume:
        if args.resume.startswith('https'):
            checkpoint = torch.hub.load_state_dict_from_url(
                args.resume, map_location='cpu', check_hash=True)
        else:
            checkpoint = torch.load(args.resume, map_location='cpu')
        model_state_dict = model_without_ddp.state_dict()
        checkpoint_state_dict = checkpoint['model']
        checkpoint_state_dict = {k.replace('detr.', ''): v for k, v in checkpoint['model'].items()}
        resume_state_dict = {}
        for k, v in model_state_dict.items():
            if k not in checkpoint_state_dict:
                resume_value = v

Nov 15 '21 18:11 athus1990

In your first message you only wrote with crowdhuman. What is your full command then?

Nov 15 '21 18:11 timmeinhardt

Hi @timmeinhardt , My args are still this:

"args": [
              "with", 
              // "deformable",
              // "tracking",
              "crowdhuman", 
              // "full_res",
              // "output_dir=models/crowdhuman_train_val_no_track_v2",
          ],

But I made sure before train(args) is called in the main function to overwrite args.resume=False.

    if __name__ == '__main__':
                      ......
               args.output_dir='models/corwdhuman_no_tracking_no_deform'
               args.resume=False
               train(args)

Nov 15 '21 18:11 athus1990

Please start with one of the presented commands and see if these work. And then build your command up from that.

Nov 15 '21 18:11 timmeinhardt

Hi, I can confirm with certainty that training without the "deformable" option (ie) pure DETR (no tracking) does not work with the current settings. with deformable option does work properly and produces some validation results.

in other words, "args": [ "with", "crowdhuman", ], produces 0 result for evaluation results after epoch 2. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000

However, "args": [ "with", "crowdhuman", "deformable" ], Gives the correct non zero results. In both cases args.resume is set to False. Hope someone can confirm.

Nov 28 '21 07:11 athus1990

You are giving very limited information to validate your results. In general, DETR without deformable attention takes 10 times as many epochs to converge (50 vs. 500). So comparing the two configurations just after 2 epochs is not reasonable. Does the the loss decrease for the vanilla DETR version?

Dec 02 '21 12:12 timmeinhardt

These were the results after 50 epochs for vanilla DETR. They were all zeros from epoch 1 to epoch 50. The loss decreases at first but after a few iterations just stagnates.
Interestingly when I test these models it seems that it produces logits and bounding boxes which are all exactly the same. (ie) if query size is 100, there are 100 boxes with exactly same values.

Deformable-DETR is much better with non-zero values for validation from epoch 1 to epoch 50.

Thanks for the info on deformable vs non-deformable ,I shall use deformable from now on then.

Dec 02 '21 20:12 athus1990