mmtracking Problem met when testing

Describe the bug

I used a customized datasets to train and test yolox+bytetrack model.
I used the training script to train bytetrack and got epoch_80.pth.
Then I used it as checkpoint and test it on my datasets.
I run the testing script shown below. After the detecting procedure, an error occured AssertionError: Dataset and results have different sizes: 3724 v.s. 2
Additionally, my datasets consists of satellites videos and contains 4 classes car plane ship train. Could the error be because the testing script only support single class MOT?

Reproduction

What command or script did you run?

PORT=29514 ./tools/dist_test.sh configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --checkpoint work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth --out results.pkl --eval bbox track

Did you make any modifications on the code or config? Did you understand what you have modified?

I used a customized datasets and changed the data path in config.

What dataset did you use and what task did you run?

Satellites videos. 4 classes. Training + Validation.
Train YOLOx detector and got epoch_80.pth.
Test the model on the datasets and encounter the problem.

Environment

sys.platform: linux
Python: 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 20:15:55) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2
OpenCV: 4.5.5
MMCV: 1.4.5
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMTracking: 0.10.0+

Error traceback

Traceback (most recent call last):
  File "./tools/test.py", line 224, in <module>
    main()
  File "./tools/test.py", line 214, in main
    metric = dataset.evaluate(outputs, **eval_kwargs)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmdetection/mmdet/datasets/dataset_wrappers.py", line 108, in evaluate
    ('Dataset and results have different sizes: '
AssertionError: Dataset and results have different sizes: 3724 v.s. 2
Traceback (most recent call last):
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/it_stu12/.conda/envs/SatVideoDT/bin/python', '-u', './tools/test.py', '--local_rank=0', 'configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py', '--launcher', 'pytorch', '--checkpoint', 'work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth', '--out', 'results.pkl', '--eval', 'bbox', 'track']' returned non-zero exit status 1.

Mar 14 '22 11:03 AndrewGuo0930

Here's my config bytetrack_yolox_s_512_alltrain_sat.py.

_base_ = [
    '../../_base_/datasets/mot_challenge.py', '../../_base_/default_runtime.py'
]

img_scale = (512, 512)
samples_per_gpu = 4

model = dict(
    type='ByteTrack',
    detector=dict(
        type='YOLOX',
        input_size=img_scale,
        random_size_range=(18, 32),
        random_size_interval=10,
        backbone=dict(
            type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5),
        neck=dict(
            type='YOLOXPAFPN',
            in_channels=[128, 256, 512],
            out_channels=128,
            num_csp_blocks=1),
        bbox_head=dict(
            type='YOLOXHead',
            num_classes=4,
            in_channels=128,
            feat_channels=128),
        train_cfg=dict(
            assigner=dict(type='SimOTAAssigner', center_radius=2.5)),
        test_cfg=dict(
            score_thr=0.01, nms=dict(type='nms', iou_threshold=0.7)),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=  # noqa: E251
            '/cluster/home/it_stu12/main/SatVideoDT/mmtracking/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth'  # noqa: E501
        )),
    motion=dict(type='KalmanFilter'),
    tracker=dict(
        type='ByteTracker',
        obj_score_thrs=dict(high=0.6, low=0.1),
        init_track_thr=0.7,
        weight_iou_with_det_scores=True,
        match_iou_thrs=dict(high=0.1, low=0.5, tentative=0.3),
        num_frames_retain=30))

train_pipeline = [
    dict(
        type='Mosaic',
        img_scale=img_scale,
        pad_val=114.0,
        bbox_clip_border=False),
    dict(
        type='RandomAffine',
        scaling_ratio_range=(0.1, 2),
        border=(-img_scale[0] // 2, -img_scale[1] // 2),
        bbox_clip_border=False),
    dict(
        type='MixUp',
        img_scale=img_scale,
        ratio_range=(0.8, 1.6),
        pad_val=114.0,
        bbox_clip_border=False),
    dict(type='YOLOXHSVRandomAug'),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Resize',
        img_scale=img_scale,
        keep_ratio=True,
        bbox_clip_border=False),
    dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_scale,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[0.0, 0.0, 0.0],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(
                type='Pad',
                size_divisor=32,
                pad_val=dict(img=(114.0, 114.0, 114.0))),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='VideoCollect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=samples_per_gpu,
    workers_per_gpu=4,
    persistent_workers=True,
    train=dict(
        _delete_=True,
        type='MultiImageMixDataset',
        dataset=dict(
            type='CocoDataset',
            ann_file=[
                '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/train_cocoformat.json',
            ],
            img_prefix=[
                '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/training_data',
            ],
            classes=('car', 'ship', 'plane', 'train'),
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True)
            ],
            filter_empty_gt=False),
        pipeline=train_pipeline),
    val=dict(
        pipeline=test_pipeline,
        ann_file=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
        ],
        img_prefix=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
        ],
        classes=('car', 'ship', 'plane', 'train'),
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)),
    test=dict(
        pipeline=test_pipeline,
        ann_file=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
        ],
        img_prefix=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
        ],
        classes=('car', 'ship', 'plane', 'train'),
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)))

# optimizer
# default 8 gpu
optimizer = dict(
    type='SGD',
    lr=0.001 / 8 * samples_per_gpu,
    momentum=0.9,
    weight_decay=5e-4,
    nesterov=True,
    paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0))
optimizer_config = dict(grad_clip=None)

# some hyper parameters
total_epochs = 80
num_last_epochs = 10
resume_from = None
interval = 5

# learning policy
lr_config = dict(
    policy='YOLOX',
    warmup='exp',
    by_epoch=False,
    warmup_by_epoch=True,
    warmup_ratio=1,
    warmup_iters=1,
    num_last_epochs=num_last_epochs,
    min_lr_ratio=0.05)

custom_hooks = [
    dict(
        type='YOLOXModeSwitchHook',
        num_last_epochs=num_last_epochs,
        priority=48),
    dict(
        type='SyncNormHook',
        num_last_epochs=num_last_epochs,
        interval=interval,
        priority=48),
    dict(
        type='ExpMomentumEMAHook',
        resume_from=resume_from,
        momentum=0.0001,
        priority=49)
]

checkpoint_config = dict(interval=1)
evaluation = dict(metric=['bbox', 'track'], interval=1)
search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML']

# you need to set mode='dynamic' if you are using pytorch<=1.5.0
fp16 = dict(loss_scale=dict(init_scale=512.))

Mar 14 '22 11:03 AndrewGuo0930

The training script I used train.sh

PORT=29504 ./tools/dist_train.sh /cluster/home/it_stu12/main/SatVideoDT/mmtracking/configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --no-validate

Mar 14 '22 11:03 AndrewGuo0930

You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.

If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.

Mar 16 '22 03:03 Seerkfang

But I've already set num_classes to 4 in my config and still encounter the problem.

Mar 16 '22 06:03 AndrewGuo0930

You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.

Mar 20 '22 03:03 Seerkfang

You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.

That means I could modify the code to track 4 classes rather than only 1 class? Could you please tell me which code should I modify, the configuration?

Apr 10 '22 01:04 AndrewGuo0930

You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.

If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.

Hi, i am using the demo_mot_vis.py script to run inference with bytetrack_yolox_x_crowdhuman_mot17-private-half.py as config. My goal is MOT on multi-class, result is as wanted but looking into config file the base file bytetrack_yolox_x_crowdhuman_mot17-private-half.py at line 14 has bbox_head=dict(num_classes=1),. Now i am trying to understand, how can i define the number of classes and the specific classes to consider?

May 19 '22 09:05 MarcoFrancescoMerola-rgb

Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?

Mar 08 '23 06:03 Zachein

Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?

No. I haven't used MMTracking for a long time. Maybe multi-classes MOT is supported now? You can raise an issue for help.

Mar 08 '23 07:03 AndrewGuo0930

mmtracking mmtracking copied to clipboard

Problem met when testing

mmtracking
mmtracking copied to clipboard