mmtracking icon indicating copy to clipboard operation
mmtracking copied to clipboard

Problem met when testing

Open AndrewGuo0930 opened this issue 3 years ago • 9 comments

Describe the bug

  • I used a customized datasets to train and test yolox+bytetrack model.
  • I used the training script to train bytetrack and got epoch_80.pth.
  • Then I used it as checkpoint and test it on my datasets.
  • I run the testing script shown below. After the detecting procedure, an error occured AssertionError: Dataset and results have different sizes: 3724 v.s. 2
  • Additionally, my datasets consists of satellites videos and contains 4 classes car plane ship train. Could the error be because the testing script only support single class MOT?

Reproduction

  1. What command or script did you run?
PORT=29514 ./tools/dist_test.sh configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --checkpoint work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth --out results.pkl --eval bbox track 
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  • I used a customized datasets and changed the data path in config.
  1. What dataset did you use and what task did you run?
  • Satellites videos. 4 classes. Training + Validation.
  • Train YOLOx detector and got epoch_80.pth.
  • Test the model on the datasets and encounter the problem.

Environment

sys.platform: linux
Python: 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 20:15:55) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2
OpenCV: 4.5.5
MMCV: 1.4.5
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMTracking: 0.10.0+

Error traceback

Traceback (most recent call last):
  File "./tools/test.py", line 224, in <module>
    main()
  File "./tools/test.py", line 214, in main
    metric = dataset.evaluate(outputs, **eval_kwargs)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmdetection/mmdet/datasets/dataset_wrappers.py", line 108, in evaluate
    ('Dataset and results have different sizes: '
AssertionError: Dataset and results have different sizes: 3724 v.s. 2
Traceback (most recent call last):
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/it_stu12/.conda/envs/SatVideoDT/bin/python', '-u', './tools/test.py', '--local_rank=0', 'configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py', '--launcher', 'pytorch', '--checkpoint', 'work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth', '--out', 'results.pkl', '--eval', 'bbox', 'track']' returned non-zero exit status 1.

AndrewGuo0930 avatar Mar 14 '22 11:03 AndrewGuo0930

Here's my config bytetrack_yolox_s_512_alltrain_sat.py.

_base_ = [
    '../../_base_/datasets/mot_challenge.py', '../../_base_/default_runtime.py'
]

img_scale = (512, 512)
samples_per_gpu = 4

model = dict(
    type='ByteTrack',
    detector=dict(
        type='YOLOX',
        input_size=img_scale,
        random_size_range=(18, 32),
        random_size_interval=10,
        backbone=dict(
            type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5),
        neck=dict(
            type='YOLOXPAFPN',
            in_channels=[128, 256, 512],
            out_channels=128,
            num_csp_blocks=1),
        bbox_head=dict(
            type='YOLOXHead',
            num_classes=4,
            in_channels=128,
            feat_channels=128),
        train_cfg=dict(
            assigner=dict(type='SimOTAAssigner', center_radius=2.5)),
        test_cfg=dict(
            score_thr=0.01, nms=dict(type='nms', iou_threshold=0.7)),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=  # noqa: E251
            '/cluster/home/it_stu12/main/SatVideoDT/mmtracking/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth'  # noqa: E501
        )),
    motion=dict(type='KalmanFilter'),
    tracker=dict(
        type='ByteTracker',
        obj_score_thrs=dict(high=0.6, low=0.1),
        init_track_thr=0.7,
        weight_iou_with_det_scores=True,
        match_iou_thrs=dict(high=0.1, low=0.5, tentative=0.3),
        num_frames_retain=30))

train_pipeline = [
    dict(
        type='Mosaic',
        img_scale=img_scale,
        pad_val=114.0,
        bbox_clip_border=False),
    dict(
        type='RandomAffine',
        scaling_ratio_range=(0.1, 2),
        border=(-img_scale[0] // 2, -img_scale[1] // 2),
        bbox_clip_border=False),
    dict(
        type='MixUp',
        img_scale=img_scale,
        ratio_range=(0.8, 1.6),
        pad_val=114.0,
        bbox_clip_border=False),
    dict(type='YOLOXHSVRandomAug'),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Resize',
        img_scale=img_scale,
        keep_ratio=True,
        bbox_clip_border=False),
    dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_scale,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[0.0, 0.0, 0.0],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(
                type='Pad',
                size_divisor=32,
                pad_val=dict(img=(114.0, 114.0, 114.0))),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='VideoCollect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=samples_per_gpu,
    workers_per_gpu=4,
    persistent_workers=True,
    train=dict(
        _delete_=True,
        type='MultiImageMixDataset',
        dataset=dict(
            type='CocoDataset',
            ann_file=[
                '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/train_cocoformat.json',
            ],
            img_prefix=[
                '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/training_data',
            ],
            classes=('car', 'ship', 'plane', 'train'),
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True)
            ],
            filter_empty_gt=False),
        pipeline=train_pipeline),
    val=dict(
        pipeline=test_pipeline,
        ann_file=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
        ],
        img_prefix=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
        ],
        classes=('car', 'ship', 'plane', 'train'),
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)),
    test=dict(
        pipeline=test_pipeline,
        ann_file=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
        ],
        img_prefix=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
        ],
        classes=('car', 'ship', 'plane', 'train'),
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)))

# optimizer
# default 8 gpu
optimizer = dict(
    type='SGD',
    lr=0.001 / 8 * samples_per_gpu,
    momentum=0.9,
    weight_decay=5e-4,
    nesterov=True,
    paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0))
optimizer_config = dict(grad_clip=None)

# some hyper parameters
total_epochs = 80
num_last_epochs = 10
resume_from = None
interval = 5

# learning policy
lr_config = dict(
    policy='YOLOX',
    warmup='exp',
    by_epoch=False,
    warmup_by_epoch=True,
    warmup_ratio=1,
    warmup_iters=1,
    num_last_epochs=num_last_epochs,
    min_lr_ratio=0.05)

custom_hooks = [
    dict(
        type='YOLOXModeSwitchHook',
        num_last_epochs=num_last_epochs,
        priority=48),
    dict(
        type='SyncNormHook',
        num_last_epochs=num_last_epochs,
        interval=interval,
        priority=48),
    dict(
        type='ExpMomentumEMAHook',
        resume_from=resume_from,
        momentum=0.0001,
        priority=49)
]

checkpoint_config = dict(interval=1)
evaluation = dict(metric=['bbox', 'track'], interval=1)
search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML']

# you need to set mode='dynamic' if you are using pytorch<=1.5.0
fp16 = dict(loss_scale=dict(init_scale=512.))

AndrewGuo0930 avatar Mar 14 '22 11:03 AndrewGuo0930

The training script I used train.sh

PORT=29504 ./tools/dist_train.sh /cluster/home/it_stu12/main/SatVideoDT/mmtracking/configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --no-validate

AndrewGuo0930 avatar Mar 14 '22 11:03 AndrewGuo0930

You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.

If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.

Seerkfang avatar Mar 16 '22 03:03 Seerkfang

But I've already set num_classes to 4 in my config and still encounter the problem.

AndrewGuo0930 avatar Mar 16 '22 06:03 AndrewGuo0930

You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.

Seerkfang avatar Mar 20 '22 03:03 Seerkfang

You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.

That means I could modify the code to track 4 classes rather than only 1 class? Could you please tell me which code should I modify, the configuration?

AndrewGuo0930 avatar Apr 10 '22 01:04 AndrewGuo0930

You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.

If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.

Hi, i am using the demo_mot_vis.py script to run inference with bytetrack_yolox_x_crowdhuman_mot17-private-half.py as config. My goal is MOT on multi-class, result is as wanted but looking into config file the base file bytetrack_yolox_x_crowdhuman_mot17-private-half.py at line 14 has bbox_head=dict(num_classes=1),. Now i am trying to understand, how can i define the number of classes and the specific classes to consider?

Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?

Zachein avatar Mar 08 '23 06:03 Zachein

Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?

No. I haven't used MMTracking for a long time. Maybe multi-classes MOT is supported now? You can raise an issue for help.

AndrewGuo0930 avatar Mar 08 '23 07:03 AndrewGuo0930