mmpose
mmpose copied to clipboard
[Feature] Multi-class support of RTMO
What is the feature?
Current RTMO is designed for one class human keypoint detection. I try to train it for multi-class keypoint detection, but get a error
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Traceback (most recent call last):
File "D:\code\mmpose\tools\train_rtm6d.py", line 163, in TORCH_USE_CUDA_DSA to enable device-side assertions.
Is there any way to adapt RTMO for multiple-class keypoints detection?
Another bug of RTMO is the category id can only be 1. If I define a categroy id with 0, then I get the training labels with -1. Finally I received a out of bound error. In my case, I defined the category id from 0 to 9, I received this error. Is there anyway to fix this?
Thanks in advance.
Best regards
Any other context?
No response
Currently, RTMO only supports single-class pose estimation. For multi-class detection task, have you tried to increase the num_classes in the config? BTW, categroy_id should start from 1 instead of 0.
Hello! Are there are any models in MMpose that support multi-class pose estimation? From my understanding, it is possible to use a detector to categorize each class and then apply a separate top-down model like RTMPose for each class. However, I prefer not to train and load multiple pose models, especially since my classes are quite similar to each other, albeit with differences in the number and type of keypoints.
HI! I experienced the same problem. the error seems to be in the SimOTAAssigner class in mmpose\models\task_modules\assigners\sim_ota_assigner.py
Line 158: F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1]).float().unsqueeze(0).repeat( num_valid, 1, 1))
The error is thrown here, this is because pred_scores.shape[-1] is always 1 but the gt_labels contain labels for your number of classes. When I changed pred_scores.shape[-1] to the actual number of classes, in my case 8, it passes.
However the next error arises in
Line 166:F.binary_cross_entropy( valid_pred_scores.to(dtype=torch.float32), gt_onehot_label, reduction='none', ).sum(-1).to(dtype=valid_pred_scores.dtype))
I believe this is now due to not valid_pred_scores not matching the dimension of gt_onehot_label.
Can anybody explain what should be implemented to make RTMO work for multiple classes?
I manage to solve this problem to get rtmo work for multiple classes.
Here is my solution: mmpose/models/heads/hybrid_heads/yoloxpose_head.py change the order of arguments of YOLOXposehead module: line 22: class YOLOXPoseHeadModule(BaseModule): """YOLOXPose head module for one-stage human pose estimation. def init( self, num_keypoints: int, num_classes: int, in_channels: Union[int, Sequence],
line 224: class YOLOXPoseHead(BaseModule): def init( self, num_keypoints: int, num_classes: int, head_module_cfg: Optional[ConfigType] = None,
mmpose/models/heads/hybrid_heads/rtmo_head.py line 25: class RTMOHeadModule(BaseModule): def init( self, num_keypoints: int, num_classes: int, in_channels: int,
line 662: class RTMOHead(YOLOXPoseHead): def init( self, num_keypoints: int, num_classes: int, head_module_cfg: ConfigType, line 732 # override to ensure consistency head_module_cfg['featmap_strides'] = featmap_strides head_module_cfg['num_keypoints'] = num_keypoints head_module_cfg['num_classes'] = num_classes
line 811 # 3. calculate loss extra_info = dict(num_samples=num_total_samples) losses = dict() cls_preds_all = flatten_cls_scores.view(-1, self.num_classes) cls_target_all = torch.zeros_like(cls_preds_all).to(obj_targets)
line 854: with torch.no_grad(): diff_cc = torch.norm(kpt_cc_preds - kpt_targets, dim=-1) diff_reg = torch.norm(kpt_reg_preds - kpt_targets, dim=-1) mask = (diff_reg > diff_cc).float() kpt_weights_reg = vis_targets * mask #oks = self.assigner.oks_calculator(kpt_cc_preds, kpt_tpgets, vis_targets, pos_areas) #cls_targets = oks.unsqueeze(1) losses['loss_oks'] = self.loss_oks(kpt_reg_preds, kpt_cc_preds.detach(), kpt_weights_reg, pos_areas) line 878: extra_info['overlaps'] = cls_targets cls_targets = cls_targets.pow(self.overlaps_power).detach() cls_target_all[pos_masks] = cls_targets # 3.4 classification loss losses['loss_cls'] = self.loss_cls(cls_preds_all, cls_target_all, obj_weights) / num_total_samples