trackformer I try to train with MOT17,when I use "multi_frame_attention = True",I get " RuntimeError: stack expects each tensor to be equal size, but got [2, 2, 21, 27, 43] at entry 0 and [2, 2, 21, 27, 42] at entry 1"

Expected behavior:

I try to train with MOT17,when I use "multi_frame_attention = True",I get " RuntimeError: stack expects each tensor to be equal size, but got [2, 2, 21, 27, 43] at entry 0 and [2, 2, 21, 27, 42] at entry 1"

Environment:

Provide your environment information using the following command:

PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.17

Python version: 3.7.16 (default, Jan 17 2023, 22:20:44)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-135-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: 11.3.109
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 515.86.01
cuDNN version: Probably one of the following:
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.21.6
[pip3] torch==1.13.1
[pip3] torchfile==0.1.0
[pip3] torchvision==0.14.1
[conda] numpy                     1.21.6                   pypi_0    pypi
[conda] torch                     1.13.1                   pypi_0    pypi
[conda] torchfile                 0.1.0                    pypi_0    pypi
[conda] torchvision               0.14.1                   pypi_0    pypi

Apr 18 '23 16:04 niangea

And when I use "Tracking = true",I get "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) "

Apr 18 '23 16:04 niangea

Please follow the install instructions. For example, your PyTorch version is wrong. This should explain the first error. The second one might also be resolved if not your system is not properly recognising the GPUs.

Apr 18 '23 16:04 timmeinhardt

Dear Sir, I am unable to modify my Pytorch version because I conducted the experiment using my school's server. The CUDA version on it is already fixed. Is there any other solution

Apr 18 '23 16:04 niangea

The second one is that I can detect gpu because I am using your track to evaluate MOT17 using gpu

Apr 18 '23 16:04 niangea

Fixing the code for the PyTorch version you are limited to. But this might involve quite a lot of code changes. The support for this goes beyond this repository.

Apr 18 '23 16:04 timmeinhardt

Is there a method for training without using these two parameters? I am setting this parameter to false and training, but the resulting model performance is very poor

Apr 18 '23 16:04 niangea

This is the parameter I am currently using for training. This parameter can be trained normally, but the effect is very poor. Can you give me some suggestions for parameter modification ############ aux_loss = True backbone = 'resnet50' batch_size = 2 bbox_loss_coef = 5.0 clip_max_norm = 0.1 cls_loss_coef = 2.0 coco_and_crowdhuman_prev_frame_rnd_augs = 0.2 coco_min_num_objects = 0 coco_panoptic_path = None coco_path = 'data/coco_2017' coco_person_train_split = None crowdhuman_path = 'data/CrowdHuman' crowdhuman_train_split = None dataset = 'mot' debug = False dec_layers = 6 dec_n_points = 4 deformable = True device = 'cuda' dice_loss_coef = 1.0 dilation = False dim_feedforward = 1024 dist_url = 'env://' dropout = 0.1 enc_layers = 6 enc_n_points = 4 eos_coef = 0.1 epochs = 50 eval_only = False eval_train = False focal_alpha = 0.25 focal_gamma = 2 focal_loss = True freeze_detr = False giou_loss_coef = 2 hidden_dim = 288 load_mask_head_from_model = None lr = 0.0002 lr_backbone = 2e-05 lr_backbone_names = ['backbone.0'] lr_drop = 10 lr_linear_proj_mult = 0.1 lr_linear_proj_names = ['reference_points', 'sampling_offsets'] lr_track = 0.0001 mask_loss_coef = 1.0 masks = False merge_frame_features = False mot_path_train = 'data/MOT17' mot_path_val = 'data/MOT17' multi_frame_attention = True multi_frame_attention_separate_encoder = True multi_frame_encoding = True nheads = 8 no_vis = False num_feature_levels = 4 num_queries = 500 num_workers = 2 output_dir = 'models/mot17_deformable_multi_frame' overflow_boxes = True overwrite_lr_scheduler = False overwrite_lrs = False position_embedding = 'sine' pre_norm = False resume = 'models/r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint_hidden_dim_288.pth' resume_optim = False resume_shift_neuron = False resume_vis = False save_model_interval = 5 seed = 42 set_cost_bbox = 5.0 set_cost_class = 2.0 set_cost_giou = 2.0 start_epoch = 1 track_attention = False track_backprop_prev_frame = False track_prev_frame_range = 5 track_prev_frame_rnd_augs = 0.01 track_prev_prev_frame = False track_query_false_negative_prob = 0.4 track_query_false_positive_eos_weight = True track_query_false_positive_prob = 0.1 tracking = True tracking_eval = True train_split = 'mot17_train_coco' two_stage = False val_interval = 5 val_split = 'mot17_train_cross_val_frame_0_5_to_1_0_coco' vis_and_log_interval = 50 vis_port = 8090 vis_server = '' weight_decay = 0.0001 with_box_refine = True world_size = 1 img_transform: max_size = 1333 val_width = 800 ##########

Apr 18 '23 16:04 niangea

You can not just set the param to false. You must remove the entire multi_frame option from the command. To revert all the changes this is applying.

Apr 18 '23 16:04 timmeinhardt

Yes, I have modified the commands used for training. I am currently using "Python train. py" and the set of parameters above

Apr 18 '23 16:04 niangea

Now I use"python train.py" with args ############ aux_loss = True backbone = 'resnet50' batch_size = 2 bbox_loss_coef = 5.0 clip_max_norm = 0.1 cls_loss_coef = 2.0 coco_and_crowdhuman_prev_frame_rnd_augs = 0.2 coco_min_num_objects = 0 coco_panoptic_path = None coco_path = 'data/coco_2017' coco_person_train_split = None crowdhuman_path = 'data/CrowdHuman' crowdhuman_train_split = None dataset = 'mot' debug = False dec_layers = 6 dec_n_points = 4 deformable = True device = 'cuda' dice_loss_coef = 1.0 dilation = False dim_feedforward = 1024 dist_url = 'env://' dropout = 0.1 enc_layers = 6 enc_n_points = 4 eos_coef = 0.1 epochs = 50 eval_only = False eval_train = False focal_alpha = 0.25 focal_gamma = 2 focal_loss = True freeze_detr = False giou_loss_coef = 2 hidden_dim = 288 load_mask_head_from_model = None lr = 0.0002 lr_backbone = 2e-05 lr_backbone_names = ['backbone.0'] lr_drop = 10 lr_linear_proj_mult = 0.1 lr_linear_proj_names = ['reference_points', 'sampling_offsets'] lr_track = 0.0001 mask_loss_coef = 1.0 masks = False merge_frame_features = False mot_path_train = 'data/MOT17' mot_path_val = 'data/MOT17' multi_frame_attention = True multi_frame_attention_separate_encoder = True multi_frame_encoding = True nheads = 8 no_vis = False num_feature_levels = 4 num_queries = 500 num_workers = 2 output_dir = 'models/mot17_deformable_multi_frame' overflow_boxes = True overwrite_lr_scheduler = False overwrite_lrs = False position_embedding = 'sine' pre_norm = False resume = 'models/r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint_hidden_dim_288.pth' resume_optim = False resume_shift_neuron = False resume_vis = False save_model_interval = 5 seed = 42 set_cost_bbox = 5.0 set_cost_class = 2.0 set_cost_giou = 2.0 start_epoch = 1 track_attention = False track_backprop_prev_frame = False track_prev_frame_range = 5 track_prev_frame_rnd_augs = 0.01 track_prev_prev_frame = False track_query_false_negative_prob = 0.4 track_query_false_positive_eos_weight = True track_query_false_positive_prob = 0.1 tracking = True tracking_eval = True train_split = 'mot17_train_coco' two_stage = False val_interval = 5 val_split = 'mot17_train_cross_val_frame_0_5_to_1_0_coco' vis_and_log_interval = 50 vis_port = 8090 vis_server = '' weight_decay = 0.0001 with_box_refine = True world_size = 1 img_transform: max_size = 1333 val_width = 800 ########### But the trained model has very poor detection performance

Apr 18 '23 16:04 niangea

I can not check all those parameters. Try running

python src/train.py with \
    mot17_crowdhuman \
    deformable \
    multi_frame \
    tracking \
    output_dir=models/mot17_crowdhuman_deformable_multi_frame \

without the multi_frame line.

Apr 18 '23 16:04 timmeinhardt

Now I run with "python src/train.py with mot17 deformable output_dir=models/mot17_crowdhuman_deformable_multi_frame " but get"NotImplementedError: No rule for transformer.level_embed with shape torch.Size([4, 256])."

Apr 18 '23 16:04 niangea

The command seems to be missing the tracking and \. Please take a step back and see what you want to do. We can not debug your issue via github. This is for more general discussions.

Apr 18 '23 16:04 timmeinhardt

I'm very sorry, but I can't add Tracking because adding it would result in "RuntimeError: indications should be either on CPU or on the same device as the indexed tensor (CPU)", but I do use GPU, so I deleted both of them to execute

Apr 18 '23 16:04 niangea

This will not work. Please follow the instructions closely. Just removing some of the options will of course change quite a lot. The CPU/GPU error indicates that our code is not finding your GPU. Try to debug this and keep the config as it is supposed to be.

Apr 19 '23 11:04 timmeinhardt

I am getting same error "RuntimeError: stack expects each tensor to be equal size, but got [2, 2, 160, 160, 43] at entry 0 and [2, 2, 160, 160, 42] at entry 1".

This error comes while calculating the position encoding for transformer encoder layer. Any tips for solving this!

Jun 26 '24 13:06 hardikkamboj

This is the whole log -

"Traceback (most recent call last): File "/home/hardikkamboj/TrackRTMO/src/train.py", line 370, in train(args) File "/home/hardikkamboj/TrackRTMO/src/train.py", line 296, in train train_one_epoch( File "/home/hardikkamboj/TrackRTMO/src/trackformer/engine.py", line 135, in train_one_epoch outputs, targets, *_ = model(samples, targets) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/hardikkamboj/TrackRTMO/src/trackformer/models/detr_tracking.py", line 254, in forward prev_out, _, prev_features, _, _ = super().forward([t['prev_image'] for t in targets]) File "/home/hardikkamboj/TrackRTMO/src/trackformer/models/deformable_detr.py", line 146, in forward features, pos = self.backbone(samples) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/hardikkamboj/TrackRTMO/src/trackformer/models/backbone.py", line 121, in forward pos.append(self1.to(x.tensors.dtype)) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/hardikkamboj/TrackRTMO/src/trackformer/models/position_encoding.py", line 97, in forward pos_x = torch.stack((pos_x[:, :, :, :, 0::2].sin(), pos_x[:, :, :, :, 1::2].cos()), dim=5).flatten(4) RuntimeError: stack expects each tensor to be equal size, but got [2, 2, 160, 160, 43] at entry 0 and [2, 2, 160, 160, 42] at entry 1"

Jun 26 '24 13:06 hardikkamboj

trackformer trackformer copied to clipboard

I try to train with MOT17,when I use "multi_frame_attention = True",I get " RuntimeError: stack expects each tensor to be equal size, but got [2, 2, 21, 27, 43] at entry 0 and [2, 2, 21, 27, 42] at entry 1"

Expected behavior:

Environment:

trackformer
trackformer copied to clipboard