mmtracking
mmtracking copied to clipboard
Problem met when testing
Describe the bug
- I used a customized datasets to train and test yolox+bytetrack model.
- I used the training script to train bytetrack and got
epoch_80.pth. - Then I used it as checkpoint and test it on my datasets.
- I run the testing script shown below. After the detecting procedure, an error occured
AssertionError: Dataset and results have different sizes: 3724 v.s. 2 - Additionally, my datasets consists of satellites videos and contains 4 classes
carplaneshiptrain. Could the error be because the testing script only support single class MOT?
Reproduction
- What command or script did you run?
PORT=29514 ./tools/dist_test.sh configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --checkpoint work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth --out results.pkl --eval bbox track
- Did you make any modifications on the code or config? Did you understand what you have modified?
- I used a customized datasets and changed the data path in config.
- What dataset did you use and what task did you run?
- Satellites videos. 4 classes. Training + Validation.
- Train YOLOx detector and got
epoch_80.pth. - Test the model on the datasets and encounter the problem.
Environment
sys.platform: linux
Python: 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 20:15:55) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.8.2
OpenCV: 4.5.5
MMCV: 1.4.5
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMTracking: 0.10.0+
Error traceback
Traceback (most recent call last):
File "./tools/test.py", line 224, in <module>
main()
File "./tools/test.py", line 214, in main
metric = dataset.evaluate(outputs, **eval_kwargs)
File "/cluster/home/it_stu12/main/SatVideoDT/mmdetection/mmdet/datasets/dataset_wrappers.py", line 108, in evaluate
('Dataset and results have different sizes: '
AssertionError: Dataset and results have different sizes: 3724 v.s. 2
Traceback (most recent call last):
File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
main()
File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/it_stu12/.conda/envs/SatVideoDT/bin/python', '-u', './tools/test.py', '--local_rank=0', 'configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py', '--launcher', 'pytorch', '--checkpoint', 'work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth', '--out', 'results.pkl', '--eval', 'bbox', 'track']' returned non-zero exit status 1.
Here's my config bytetrack_yolox_s_512_alltrain_sat.py.
_base_ = [
'../../_base_/datasets/mot_challenge.py', '../../_base_/default_runtime.py'
]
img_scale = (512, 512)
samples_per_gpu = 4
model = dict(
type='ByteTrack',
detector=dict(
type='YOLOX',
input_size=img_scale,
random_size_range=(18, 32),
random_size_interval=10,
backbone=dict(
type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5),
neck=dict(
type='YOLOXPAFPN',
in_channels=[128, 256, 512],
out_channels=128,
num_csp_blocks=1),
bbox_head=dict(
type='YOLOXHead',
num_classes=4,
in_channels=128,
feat_channels=128),
train_cfg=dict(
assigner=dict(type='SimOTAAssigner', center_radius=2.5)),
test_cfg=dict(
score_thr=0.01, nms=dict(type='nms', iou_threshold=0.7)),
init_cfg=dict(
type='Pretrained',
checkpoint= # noqa: E251
'/cluster/home/it_stu12/main/SatVideoDT/mmtracking/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth' # noqa: E501
)),
motion=dict(type='KalmanFilter'),
tracker=dict(
type='ByteTracker',
obj_score_thrs=dict(high=0.6, low=0.1),
init_track_thr=0.7,
weight_iou_with_det_scores=True,
match_iou_thrs=dict(high=0.1, low=0.5, tentative=0.3),
num_frames_retain=30))
train_pipeline = [
dict(
type='Mosaic',
img_scale=img_scale,
pad_val=114.0,
bbox_clip_border=False),
dict(
type='RandomAffine',
scaling_ratio_range=(0.1, 2),
border=(-img_scale[0] // 2, -img_scale[1] // 2),
bbox_clip_border=False),
dict(
type='MixUp',
img_scale=img_scale,
ratio_range=(0.8, 1.6),
pad_val=114.0,
bbox_clip_border=False),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Resize',
img_scale=img_scale,
keep_ratio=True,
bbox_clip_border=False),
dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=img_scale,
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[0.0, 0.0, 0.0],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(
type='Pad',
size_divisor=32,
pad_val=dict(img=(114.0, 114.0, 114.0))),
dict(type='ImageToTensor', keys=['img']),
dict(type='VideoCollect', keys=['img'])
])
]
data = dict(
samples_per_gpu=samples_per_gpu,
workers_per_gpu=4,
persistent_workers=True,
train=dict(
_delete_=True,
type='MultiImageMixDataset',
dataset=dict(
type='CocoDataset',
ann_file=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/train_cocoformat.json',
],
img_prefix=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/training_data',
],
classes=('car', 'ship', 'plane', 'train'),
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True)
],
filter_empty_gt=False),
pipeline=train_pipeline),
val=dict(
pipeline=test_pipeline,
ann_file=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
],
img_prefix=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
],
classes=('car', 'ship', 'plane', 'train'),
interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)),
test=dict(
pipeline=test_pipeline,
ann_file=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
],
img_prefix=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
],
classes=('car', 'ship', 'plane', 'train'),
interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)))
# optimizer
# default 8 gpu
optimizer = dict(
type='SGD',
lr=0.001 / 8 * samples_per_gpu,
momentum=0.9,
weight_decay=5e-4,
nesterov=True,
paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0))
optimizer_config = dict(grad_clip=None)
# some hyper parameters
total_epochs = 80
num_last_epochs = 10
resume_from = None
interval = 5
# learning policy
lr_config = dict(
policy='YOLOX',
warmup='exp',
by_epoch=False,
warmup_by_epoch=True,
warmup_ratio=1,
warmup_iters=1,
num_last_epochs=num_last_epochs,
min_lr_ratio=0.05)
custom_hooks = [
dict(
type='YOLOXModeSwitchHook',
num_last_epochs=num_last_epochs,
priority=48),
dict(
type='SyncNormHook',
num_last_epochs=num_last_epochs,
interval=interval,
priority=48),
dict(
type='ExpMomentumEMAHook',
resume_from=resume_from,
momentum=0.0001,
priority=49)
]
checkpoint_config = dict(interval=1)
evaluation = dict(metric=['bbox', 'track'], interval=1)
search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML']
# you need to set mode='dynamic' if you are using pytorch<=1.5.0
fp16 = dict(loss_scale=dict(init_scale=512.))
The training script I used train.sh
PORT=29504 ./tools/dist_train.sh /cluster/home/it_stu12/main/SatVideoDT/mmtracking/configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --no-validate
You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.
If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.
But I've already set num_classes to 4 in my config and still encounter the problem.
You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.
You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.
That means I could modify the code to track 4 classes rather than only 1 class? Could you please tell me which code should I modify, the configuration?
You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.
If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.
Hi, i am using the demo_mot_vis.py script to run inference with bytetrack_yolox_x_crowdhuman_mot17-private-half.py as config. My goal is MOT on multi-class, result is as wanted but looking into config file the base file bytetrack_yolox_x_crowdhuman_mot17-private-half.py at line 14 has bbox_head=dict(num_classes=1),.
Now i am trying to understand,
how can i define the number of classes and the specific classes to consider?
Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?
Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?
No. I haven't used MMTracking for a long time. Maybe multi-classes MOT is supported now? You can raise an issue for help.