Fine Tune a SpatioTemporal Action Detection model on a custom dataset in AVA format
The doc issue
Hi, can someone show me how to fine tune a model for Spatio-Temporal Action Detection with a custom AVA dataset with (in my case) 6 classes?
I modified the config file by changing the number of classes here:
bbox_head=dict(
type='BBoxHeadAVA',
in_channels=2304,
num_classes=7, # from 80+1 of AVA to 6+1 of the custom dataset
multilabel=True,
dropout_ratio=0.5)),
and specifing the model to load for fine tuning in the load_from parameter.
However I get the following error when staring the train.py script:
The model and loaded state dict do not match exactly
size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 2304]) from checkpoint, the shape in current model is torch.Size([7, 2304]).
size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([7]).
Suggest a potential alternative/fix
No response
The warning message is as expected and it's not an error message. The cls_head related weights for your custom dataset is different from the original weights for K400. you can continue training if it is not interrupted.
Thanks @cir7 for your reply. Unfortunately the training is interrupted because of this classes mismatch, indeed I get the error:
RuntimeError: The size of tensor a (6) must match the size of tensor b (80) at non-singleton dimension 1
Here is my config file if it can be helpful:
https://www.dropbox.com/s/l00u26jduwz9or6/slowfast_kinetics400-pretrained-r50_8xb6-8x8x1-cosine-10e_ava22-rgb%20%281%29.py?dl=0
From the documentation it's a bit unclear how to setup fine tuning in case of Spatio-Temporal model, as I thought it was the same as the Action Recognition tutorial were as showed in the guide you have to change num_classes in the cls_head dict, but this field doesn't exist in the Spatio-Temporal models.
Can you please explain me how to set it up in order to fine tune a pretrained SlowFast model on my current dataset?
custom action detection dataset requires specifying num_classes in AVADataset, please check it.
change the mmaction/models/roi_heads/bbox_heads/bbox_head.py. Add these 2 lines after the row 244, which can change the gt's class number from 81 to 7 for sampling_result in sampling_results: sampling_result.pos_gt_labels = sampling_result.pos_gt_labels[:, :self.num_classes]
I am only fine-tuning one Class and when I try fine-tuning the model I get this output in the train logs
04/07 17:42:26 - mmengine - INFO - Epoch(train) [1][ 20/198] lr: 5.8645e-04 eta: 2 days, 6:09:56 time: 19.7365 data_time: 18.5523 memory: 15482 grad_norm: 0.0000 loss: nan recall@thr=0.5: nan prec@thr=0.5: nan recall@top1: nan prec@top1: nan loss_action_cls: nan 04/07 17:44:48 - mmengine - INFO - Epoch(train) [1][ 40/198] lr: 6.7745e-04 eta: 1 day, 12:42:27 time: 7.0682 data_time: 5.8968 memory: 15430 grad_norm: 0.0000 loss: nan recall@thr=0.5: nan prec@thr=0.5: nan recall@top1: nan prec@top1:
bbox_head=dict( type='BBoxHeadAVA', background_class=False, in_channels=2304, num_classes=1, multilabel=False, topk=(1,), dropout_ratio=0.5)),
do you know if that causes this type of results?