mmaction2 Fine Tune a SpatioTemporal Action Detection model on a custom dataset in AVA format

The doc issue

Hi, can someone show me how to fine tune a model for Spatio-Temporal Action Detection with a custom AVA dataset with (in my case) 6 classes?

I modified the config file by changing the number of classes here:

bbox_head=dict(
            type='BBoxHeadAVA',
            in_channels=2304,
            num_classes=7, # from 80+1 of AVA to 6+1 of the custom dataset
            multilabel=True,
            dropout_ratio=0.5)),

and specifing the model to load for fine tuning in the load_from parameter.

However I get the following error when staring the train.py script:

The model and loaded state dict do not match exactly

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 2304]) from checkpoint, the shape in current model is torch.Size([7, 2304]).
size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([7]).

Suggest a potential alternative/fix

No response

Jul 05 '23 15:07 damianozappia

The warning message is as expected and it's not an error message. The cls_head related weights for your custom dataset is different from the original weights for K400. you can continue training if it is not interrupted.

Jul 06 '23 02:07 cir7

Thanks @cir7 for your reply. Unfortunately the training is interrupted because of this classes mismatch, indeed I get the error: RuntimeError: The size of tensor a (6) must match the size of tensor b (80) at non-singleton dimension 1

Here is my config file if it can be helpful:

https://www.dropbox.com/s/l00u26jduwz9or6/slowfast_kinetics400-pretrained-r50_8xb6-8x8x1-cosine-10e_ava22-rgb%20%281%29.py?dl=0

From the documentation it's a bit unclear how to setup fine tuning in case of Spatio-Temporal model, as I thought it was the same as the Action Recognition tutorial were as showed in the guide you have to change num_classes in the cls_head dict, but this field doesn't exist in the Spatio-Temporal models.

Can you please explain me how to set it up in order to fine tune a pretrained SlowFast model on my current dataset?

Jul 06 '23 07:07 damianozappia

custom action detection dataset requires specifying num_classes in AVADataset, please check it.

Jul 06 '23 11:07 cir7

change the mmaction/models/roi_heads/bbox_heads/bbox_head.py. Add these 2 lines after the row 244, which can change the gt's class number from 81 to 7 for sampling_result in sampling_results: sampling_result.pos_gt_labels = sampling_result.pos_gt_labels[:, :self.num_classes]

Aug 12 '24 20:08 Yizhao-AwakeAI

I am only fine-tuning one Class and when I try fine-tuning the model I get this output in the train logs

04/07 17:42:26 - mmengine - INFO - Epoch(train) [1][ 20/198] lr: 5.8645e-04 eta: 2 days, 6:09:56 time: 19.7365 data_time: 18.5523 memory: 15482 grad_norm: 0.0000 loss: nan recall@thr=0.5: nan prec@thr=0.5: nan recall@top1: nan prec@top1: nan loss_action_cls: nan 04/07 17:44:48 - mmengine - INFO - Epoch(train) [1][ 40/198] lr: 6.7745e-04 eta: 1 day, 12:42:27 time: 7.0682 data_time: 5.8968 memory: 15430 grad_norm: 0.0000 loss: nan recall@thr=0.5: nan prec@thr=0.5: nan recall@top1: nan prec@top1:

bbox_head=dict( type='BBoxHeadAVA', background_class=False, in_channels=2304, num_classes=1, multilabel=False, topk=(1,), dropout_ratio=0.5)),

do you know if that causes this type of results?

Apr 07 '25 18:04 PabloMurrieta