mmpose
mmpose copied to clipboard
[Bug] Why the RTMO model train has CUDA kernel errors?
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).
Environment
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
addict 2.4.0 pypi_0 pypi
aliyun-python-sdk-core 2.14.0 pypi_0 pypi
aliyun-python-sdk-kms 2.16.2 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
ca-certificates 2023.12.12 h06a4308_0
certifi 2022.12.7 pypi_0 pypi
cffi 1.16.0 pypi_0 pypi
charset-normalizer 2.1.1 pypi_0 pypi
chumpy 0.70 pypi_0 pypi
click 8.1.7 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
contourpy 1.1.1 pypi_0 pypi
coverage 7.4.0 pypi_0 pypi
crcmod 1.7 pypi_0 pypi
cryptography 41.0.7 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
cython 3.0.8 pypi_0 pypi
exceptiongroup 1.2.0 pypi_0 pypi
filelock 3.9.0 pypi_0 pypi
flake8 7.0.0 pypi_0 pypi
fonttools 4.47.0 pypi_0 pypi
fsspec 2023.4.0 pypi_0 pypi
idna 3.4 pypi_0 pypi
importlib-metadata 7.0.1 pypi_0 pypi
importlib-resources 6.1.1 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
interrogate 1.5.0 pypi_0 pypi
isort 4.3.21 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jmespath 0.10.0 pypi_0 pypi
json-tricks 3.17.3 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
markdown 3.5.2 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
matplotlib 3.7.4 pypi_0 pypi
mccabe 0.7.0 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mmcv 2.1.0 pypi_0 pypi
mmdet 3.2.0 pypi_0 pypi
mmengine 0.10.2 pypi_0 pypi
mmpose 1.3.0 dev_0 <develop>
model-index 0.1.11 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
munkres 1.1.4 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.0 pypi_0 pypi
numpy 1.24.1 pypi_0 pypi
opencv-python 4.9.0.80 pypi_0 pypi
opendatalab 0.0.10 pypi_0 pypi
openmim 0.3.9 pypi_0 pypi
openssl 3.0.12 h7f8727e_0
openxlab 0.0.33 pypi_0 pypi
ordered-set 4.1.0 pypi_0 pypi
oss2 2.17.0 pypi_0 pypi
packaging 23.2 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
parameterized 0.9.0 pypi_0 pypi
pillow 9.3.0 pypi_0 pypi
pip 23.3.1 py38h06a4308_0
platformdirs 4.1.0 pypi_0 pypi
pluggy 1.3.0 pypi_0 pypi
py 1.11.0 pypi_0 pypi
pycocotools 2.0.7 pypi_0 pypi
pycodestyle 2.11.1 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pycryptodome 3.20.0 pypi_0 pypi
pyflakes 3.2.0 pypi_0 pypi
pygments 2.17.2 pypi_0 pypi
pyparsing 3.1.1 pypi_0 pypi
pytest 7.4.4 pypi_0 pypi
pytest-runner 6.0.1 pypi_0 pypi
python 3.8.18 h955ad1f_0
python-dateutil 2.8.2 pypi_0 pypi
pytz 2023.3.post1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.28.2 pypi_0 pypi
rich 13.4.2 pypi_0 pypi
scipy 1.10.1 pypi_0 pypi
setuptools 60.2.0 pypi_0 pypi
shapely 2.0.2 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
sympy 1.12 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
termcolor 2.4.0 pypi_0 pypi
terminaltables 3.1.10 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
toml 0.10.2 pypi_0 pypi
tomli 2.0.1 pypi_0 pypi
torch 2.1.2+cu118 pypi_0 pypi
torchaudio 2.1.2+cu118 pypi_0 pypi
torchvision 0.16.2+cu118 pypi_0 pypi
tqdm 4.65.2 pypi_0 pypi
triton 2.1.0 pypi_0 pypi
typing-extensions 4.4.0 pypi_0 pypi
tzdata 2023.4 pypi_0 pypi
urllib3 1.26.13 pypi_0 pypi
wheel 0.41.2 py38h06a4308_0
xdoctest 1.1.2 pypi_0 pypi
xtcocotools 1.14.3 pypi_0 pypi
xz 5.4.5 h5eee18b_0
yapf 0.40.2 pypi_0 pypi
zipp 3.17.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
Reproduces the problem - code sample
configs/base/datasets/coco_arm.py
dataset_info = dict(
dataset_name='coco',
paper_info=dict(
author='Lin, Tsung-Yi and Maire, Michael and '
'Belongie, Serge and Hays, James and '
'Perona, Pietro and Ramanan, Deva and '
r'Doll{\'a}r, Piotr and Zitnick, C Lawrence',
title='Microsoft coco: Common objects in context',
container='European conference on computer vision',
year='2014',
homepage='http://cocodataset.org/',
),
keypoint_info={
0:
dict(
name='LI11',
id=0,
color=[255,255,255],
type='upper',
swap='LI11'),
1:
dict(
name='LI10',
id=1,
color=[255,255,255],
type='upper',
swap='LI10'),
2:
dict(
name='TE5',
id=2,
color=[255,255,255],
type='upper',
swap='TE5'),
3:
dict(
name='LI4',
id=3,
color=[255,255,255],
type='upper',
swap='LI4'),
4:
dict(
name='TE3',
id=4,
color=[255,255,255],
type='upper',
swap='TE3')
},
skeleton_info={
0:
dict(link=('LI11', 'LI11'), id=0, color=[255, 0, 0]),
1:
dict(link=('LI10', 'LI10'), id=1, color=[255, 0, 0]),
2:
dict(link=('TE5', 'TE5'), id=2, color=[255, 0, 0]),
3:
dict(link=('LI4', 'LI4'), id=3, color=[255, 0, 0]),
4:
dict(link=('TE3', 'TE3'), id=4, color=[255, 0, 0]),
},
joint_weights=[ # COCO dataset에서 사용하는 keypoint에 대한 가중치 (대부분 1)
1., 1., 1., 1., 1.
],
sigmas=[
0.015, 0.015, 0.015, 0.015, 0.015 # edit : 0.025 -> 0.005 -> 0.015
]
)
RTMO-l model mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-l_16xb16-600e_coco-640x640.py
_base_ = ['../../../_base_/default_runtime.py']
# runtime
train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)])
auto_scale_lr = dict(base_batch_size=256)
default_hooks = dict(
checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3))
optim_wrapper = dict(
type='OptimWrapper',
constructor='ForceDefaultOptimWrapperConstructor',
optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05),
paramwise_cfg=dict(
norm_decay_mult=0,
bias_decay_mult=0,
bypass_duplicate=True,
force_default_settings=True,
custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})),
clip_grad=dict(max_norm=0.1, norm_type=2))
param_scheduler = [
dict(
type='QuadraticWarmupLR',
by_epoch=True,
begin=0,
end=5,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
eta_min=0.0002,
begin=5,
T_max=280,
end=280,
by_epoch=True,
convert_to_iter_based=True),
# this scheduler is used to increase the lr from 2e-4 to 5e-4
dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281),
dict(
type='CosineAnnealingLR',
eta_min=0.0002,
begin=281,
T_max=300,
end=580,
by_epoch=True,
convert_to_iter_based=True),
dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600),
]
# data
input_size = (640, 640)
metafile = 'configs/_base_/datasets/coco_arm.py'
codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size)
train_pipeline_stage1 = [
dict(type='LoadImage', backend_args=None),
dict(
type='Mosaic',
img_scale=(640, 640),
pad_val=114.0,
pre_transform=[dict(type='LoadImage', backend_args=None)]),
dict(
type='BottomupRandomAffine',
input_size=(640, 640),
shift_factor=0.1,
rotate_factor=10,
scale_factor=(0.75, 1.0),
pad_val=114,
distribution='uniform',
transform_mode='perspective',
bbox_keep_corner=False,
clip_border=True,
),
dict(
type='YOLOXMixUp',
img_scale=(640, 640),
ratio_range=(0.8, 1.6),
pad_val=114.0,
pre_transform=[dict(type='LoadImage', backend_args=None)]),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip'),
dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False),
dict(type='GenerateTarget', encoder=codec),
dict(type='PackPoseInputs'),
]
train_pipeline_stage2 = [
dict(type='LoadImage'),
dict(
type='BottomupRandomAffine',
input_size=(640, 640),
scale_type='long',
pad_val=(114, 114, 114),
bbox_keep_corner=False,
clip_border=True,
),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip'),
dict(type='BottomupGetHeatmapMask', get_invalid=True),
dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False),
dict(type='GenerateTarget', encoder=codec),
dict(type='PackPoseInputs'),
]
data_type = 'CocoArm'
data_mode = 'bottomup'
data_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/revised_dataset/PK_dataset/Train1596/'
test_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/revised_dataset/PK_dataset/Test392/'
annotation_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/revised_dataset/PK_dataset/Train1596_v1.json'
annotation_root_val = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/revised_dataset/PK_dataset/Test392_v1.json'
# train datasets
dataset_coco = dict(
type=data_type,
data_root=data_root, # edit
data_mode=data_mode,
ann_file=annotation_root, # edit
data_prefix=dict(img=data_root), # edit
pipeline=train_pipeline_stage1,
)
train_dataloader = dict(
batch_size=16,
num_workers=8,
persistent_workers=True,
pin_memory=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dataset_coco)
val_pipeline = [
dict(type='LoadImage'),
dict(
type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)),
dict(
type='PackPoseInputs',
meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape',
'input_size', 'input_center', 'input_scale'))
]
val_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
pin_memory=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=data_type, # edit
data_root=data_root, # edit
data_mode=data_mode,
ann_file=annotation_root_val, # edit
data_prefix=dict(img=test_root), # edit
test_mode=True,
pipeline=val_pipeline,
))
test_dataloader = val_dataloader
# evaluators
val_evaluator = dict(
type='CocoMetric',
ann_file=annotation_root_val, # edit
score_mode='bbox',
nms_mode='none',
)
test_evaluator = val_evaluator
# hooks
custom_hooks = [
dict(
type='YOLOXPoseModeSwitchHook',
num_last_epochs=20,
new_train_pipeline=train_pipeline_stage2,
priority=48),
dict(
type='RTMOModeSwitchHook',
epoch_attributes={
280: {
'proxy_target_cc': True,
'overlaps_power': 1.0,
'loss_cls.loss_weight': 2.0,
'loss_mle.loss_weight': 5.0,
'loss_oks.loss_weight': 10.0
},
},
priority=48),
dict(type='SyncNormHook', priority=48),
dict(
type='EMAHook',
ema_type='ExpMomentumEMA',
momentum=0.0002,
update_buffers=True,
strict_load=False,
priority=49),
]
# model
widen_factor = 1.0
deepen_factor = 1.0
model = dict(
type='BottomupPoseEstimator',
init_cfg=dict(
type='Kaiming',
layer='Conv2d',
a=2.23606797749979,
distribution='uniform',
mode='fan_in',
nonlinearity='leaky_relu'),
data_preprocessor=dict(
type='PoseDataPreprocessor',
pad_size_divisor=32,
mean=[0, 0, 0],
std=[1, 1, 1],
batch_augments=[
dict(
type='BatchSyncRandomResize',
random_size_range=(480, 800),
size_divisor=32,
interval=1),
]),
backbone=dict(
type='CSPDarknet',
deepen_factor=deepen_factor,
widen_factor=widen_factor,
out_indices=(2, 3, 4),
spp_kernal_sizes=(5, 9, 13),
norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
act_cfg=dict(type='Swish'),
init_cfg=dict(
type='Pretrained',
checkpoint='/data/home/seondeok/MMPose/mmpose/configs/body_2d_keypoint/rtmo/pretrained/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth',
prefix='backbone.',
)),
neck=dict(
type='HybridEncoder',
in_channels=[256, 512, 1024],
deepen_factor=deepen_factor,
widen_factor=widen_factor,
hidden_dim=256,
output_indices=[1, 2],
encoder_cfg=dict(
self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0),
ffn_cfg=dict(
embed_dims=256,
feedforward_channels=1024,
ffn_drop=0.0,
act_cfg=dict(type='GELU'))),
projector=dict(
type='ChannelMapper',
in_channels=[256, 256],
kernel_size=1,
out_channels=512,
act_cfg=None,
norm_cfg=dict(type='BN'),
num_outs=2)),
head=dict(
type='RTMOHead',
num_keypoints=5, # edit keypoints
featmap_strides=(16, 32),
head_module_cfg=dict(
num_classes=1, # edit
in_channels=256,
cls_feat_channels=256,
channels_per_group=36,
pose_vec_channels=512,
widen_factor=widen_factor,
stacked_convs=2,
norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
act_cfg=dict(type='Swish')),
assigner=dict(
type='SimOTAAssigner',
dynamic_k_indicator='oks',
oks_calculator=dict(type='PoseOKS', metainfo=metafile)),
prior_generator=dict(
type='MlvlPointGenerator',
centralize_points=True,
strides=[16, 32]),
dcc_cfg=dict(
in_channels=512,
feat_channels=128,
num_bins=(192, 256),
spe_channels=128,
gau_cfg=dict(
s=128,
expansion_factor=2,
dropout_rate=0.0,
drop_path=0.0,
act_fn='SiLU',
pos_enc='add')),
overlaps_power=0.5,
loss_cls=dict(
type='VariFocalLoss',
reduction='sum',
use_target_weight=True,
loss_weight=1.0),
loss_bbox=dict(
type='IoULoss',
mode='square',
eps=1e-16,
reduction='sum',
loss_weight=5.0),
loss_oks=dict(
type='OKSLoss',
reduction='none',
metainfo=metafile,
loss_weight=30.0),
loss_vis=dict(
type='BCELoss',
use_target_weight=True,
reduction='mean',
loss_weight=1.0),
loss_mle=dict(
type='MLECCLoss',
use_target_weight=True,
loss_weight=1e-2,
),
loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0),
),
test_cfg=dict(
input_size=input_size,
score_thr=0.1,
nms_thr=0.65,
))
Reproduces the problem - command or script
$ python tools/train.py configs/body_2d_keypoint/rtmo/custom_coco/rtmo-l_16xb16-600e_coco-640x640.py
Reproduces the problem - error message
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [0,0,0], thread: [1,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "tools/train.py", line 162, in <module>
main()
File "tools/train.py", line 158, in main
runner.train()
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
self.run_iter(idx, data_batch)
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
outputs = self.runner.model.train_step(
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
losses = self._run_forward(data, mode='loss') # type: ignore
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward
results = self(**data, mode=mode)
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/seondeok/MMPose/mmpose/mmpose/models/pose_estimators/base.py", line 155, in forward
return self.loss(inputs, data_samples)
File "/data/home/seondeok/MMPose/mmpose/mmpose/models/pose_estimators/bottomup.py", line 70, in loss
self.head.loss(feats, data_samples, train_cfg=self.train_cfg))
File "/data/home/seondeok/MMPose/mmpose/mmpose/models/heads/hybrid_heads/rtmo_head.py", line 793, in loss
targets = self._get_targets(flatten_priors,
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/home/seondeok/MMPose/mmpose/mmpose/models/heads/hybrid_heads/yoloxpose_head.py", line 410, in _get_targets
target = self._get_targets_single(priors, batch_cls_scores[i],
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/home/seondeok/MMPose/mmpose/mmpose/models/heads/hybrid_heads/yoloxpose_head.py", line 532, in _get_targets_single
assign_result = self.assigner.assign(
File "/data/home/seondeok/MMPose/mmpose/mmpose/models/task_modules/assigners/sim_ota_assigner.py", line 166, in assign
F.binary_cross_entropy(
File "/data/home/seondeok/.conda/envs/openmmlab_v2/lib/python3.8/site-packages/torch/nn/functional.py", line 3122, in binary_cross_entropy
return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Additional information
- When I try to train RTMO model, CUDA Kernal error accur.
- So I tried other model like topdown-heatmap : HRNet model :
mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-256x192.py - For HRNet model, train code works well
But, with RTMO model, It stopped with this step and CUDA error accur.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
01/11 17:09:16 - mmengine - INFO - load backbone. in model from: /data/home/seondeok/MMPose/mmpose/configs/body_2d_keypoint/rtmo/pretrained/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth
Loads checkpoint by local backend from path: /data/home/seondeok/MMPose/mmpose/configs/body_2d_keypoint/rtmo/pretrained/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth
01/11 17:09:18 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
01/11 17:09:18 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
01/11 17:09:18 - mmengine - INFO - Checkpoints will be saved to /data/home/seondeok/MMPose/mmpose/work_dirs/rtmo-l_16xb16-600e_coco-640x640.
How to solve this problem?
Please make sure the "category_id" in your data annotation file matches the one in the COCO annotation file.
Thank you for advice! My category_id is 9, and how to match the id in the config code? My custom coco annotation id is matched with 9. Should I have to change this number to 0?
I have the same error. Category id can only be 1. If you have other id like 0 or 9, you can get error.