mmpretrain
mmpretrain copied to clipboard
[Bug] Problem saving epoch checkpoint when fine tuning Efficientnet-b0
Describe the bug
When trying to fine tune Efficient B0, just by applying minimal changes in the Getting Started Colab notebook, when saving the first checkpoint after first epoch completes, an error runtime is raised:
"RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[32, 224, 225, 5] to have 3 channels, but got 224 channels instead"
To Reproduce
Use Google Colab MMClassification Getting started notebook JUST changing mobilenetV2 checkpoint file and config files to those from efficientnet-b0 zoo as shown:
config_file = 'configs/efficientnet/efficientnet-b0_8xb32_in1k.py' checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b0_3rdparty_8xb32_in1k_20220119-a7e2a0b1.pth'
Post related information
- The output of
pip list | grep "mmcv\|mmcls\|^torch"
mmcls 0.23.0 /content/mmclassification
mmcv 1.5.0
torch 1.11.0+cu113
torchaudio 0.11.0+cu113
torchsummary 1.5.1
torchtext 0.12.0
torchvision 0.12.0+cu113
-
Your config file if you modified it or created a new one. Nothing modified from Google Colab MMClassification getting started => Fine tune section.
-
Your train log file if you meet the problem during training.
2022-05-04 09:03:37,380 - mmcls - INFO - workflow: [('train', 1)], max: 2 epochs
2022-05-04 09:03:37,383 - mmcls - INFO - Checkpoints will be saved to /content/mmclassification/work_dirs/cats_dogs_dataset by HardDiskBackend.
2022-05-04 09:03:44,796 - mmcls - INFO - Epoch [1][10/201] lr: 5.000e-03, eta: 0:04:44, time: 0.725, data_time: 0.252, memory: 3653, loss: 0.6385
2022-05-04 09:03:49,460 - mmcls - INFO - Epoch [1][20/201] lr: 5.000e-03, eta: 0:03:47, time: 0.466, data_time: 0.016, memory: 3653, loss: 0.4478
2022-05-04 09:03:54,131 - mmcls - INFO - Epoch [1][30/201] lr: 5.000e-03, eta: 0:03:25, time: 0.467, data_time: 0.016, memory: 3653, loss: 0.3196
2022-05-04 09:03:58,821 - mmcls - INFO - Epoch [1][40/201] lr: 5.000e-03, eta: 0:03:12, time: 0.469, data_time: 0.016, memory: 3653, loss: 0.2780
2022-05-04 09:04:03,520 - mmcls - INFO - Epoch [1][50/201] lr: 5.000e-03, eta: 0:03:02, time: 0.470, data_time: 0.016, memory: 3653, loss: 0.2618
2022-05-04 09:04:08,239 - mmcls - INFO - Epoch [1][60/201] lr: 5.000e-03, eta: 0:02:54, time: 0.472, data_time: 0.016, memory: 3653, loss: 0.2120
2022-05-04 09:04:13,059 - mmcls - INFO - Epoch [1][70/201] lr: 5.000e-03, eta: 0:02:48, time: 0.482, data_time: 0.019, memory: 3653, loss: 0.1787
2022-05-04 09:04:17,811 - mmcls - INFO - Epoch [1][80/201] lr: 5.000e-03, eta: 0:02:42, time: 0.475, data_time: 0.017, memory: 3653, loss: 0.1877
2022-05-04 09:04:22,604 - mmcls - INFO - Epoch [1][90/201] lr: 5.000e-03, eta: 0:02:36, time: 0.479, data_time: 0.019, memory: 3653, loss: 0.1741
2022-05-04 09:04:27,354 - mmcls - INFO - Epoch [1][100/201] lr: 5.000e-03, eta: 0:02:30, time: 0.475, data_time: 0.016, memory: 3653, loss: 0.1909
2022-05-04 09:04:32,111 - mmcls - INFO - Epoch [1][110/201] lr: 5.000e-03, eta: 0:02:24, time: 0.476, data_time: 0.017, memory: 3653, loss: 0.1907
2022-05-04 09:04:36,872 - mmcls - INFO - Epoch [1][120/201] lr: 5.000e-03, eta: 0:02:19, time: 0.476, data_time: 0.016, memory: 3653, loss: 0.1520
2022-05-04 09:04:41,645 - mmcls - INFO - Epoch [1][130/201] lr: 5.000e-03, eta: 0:02:14, time: 0.477, data_time: 0.016, memory: 3653, loss: 0.2102
2022-05-04 09:04:46,422 - mmcls - INFO - Epoch [1][140/201] lr: 5.000e-03, eta: 0:02:08, time: 0.478, data_time: 0.016, memory: 3653, loss: 0.1830
2022-05-04 09:04:51,202 - mmcls - INFO - Epoch [1][150/201] lr: 5.000e-03, eta: 0:02:03, time: 0.478, data_time: 0.016, memory: 3653, loss: 0.1848
2022-05-04 09:04:56,005 - mmcls - INFO - Epoch [1][160/201] lr: 5.000e-03, eta: 0:01:58, time: 0.480, data_time: 0.018, memory: 3653, loss: 0.1488
2022-05-04 09:05:00,775 - mmcls - INFO - Epoch [1][170/201] lr: 5.000e-03, eta: 0:01:53, time: 0.477, data_time: 0.016, memory: 3653, loss: 0.1551
2022-05-04 09:05:05,549 - mmcls - INFO - Epoch [1][180/201] lr: 5.000e-03, eta: 0:01:48, time: 0.477, data_time: 0.017, memory: 3653, loss: 0.1437
2022-05-04 09:05:10,317 - mmcls - INFO - Epoch [1][190/201] lr: 5.000e-03, eta: 0:01:43, time: 0.477, data_time: 0.016, memory: 3653, loss: 0.1606
2022-05-04 09:05:15,096 - mmcls - INFO - Epoch [1][200/201] lr: 5.000e-03, eta: 0:01:38, time: 0.478, data_time: 0.016, memory: 3653, loss: 0.1266
2022-05-04 09:05:15,185 - mmcls - INFO - Saving checkpoint at 1 epochs
[ ] 0/1601, elapsed: 0s, ETA:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-9-23436e39b2a0>](https://localhost:8080/#) in <module>()
33 validate=True,
34 timestamp=time.strftime('%Y%m%d_%H%M%S', time.localtime()),
---> 35 meta=dict())
21 frames
[/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/conv2d_adaptive_padding.py](https://localhost:8080/#) in forward(self, x)
60 ])
61 return F.conv2d(x, self.weight, self.bias, self.stride, self.padding,
---> 62 self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[32, 224, 225, 5] to have 3 channels, but got 224 channels instead
5. Other code you modified in the `mmcls` folder.
Nothing
Additional context
Add any other context about the problem here.
Nothing
It seems there is an error in evalutionHook
config, your input of the model is the shape of [32, 224, 225, 5]
, And excepted input shep is [32, 224, 224, 3]
. Can you show me your data.val
config?
I have not changed anything in the validation scheme that works for other classifiers. The output of the cfg.data.val
import json
print(json.dumps(cfg.data.val, indent=2))
{
"type": "ImageNet",
"data_prefix": "data/cats_dogs_dataset/val_set/val_set",
"ann_file": "data/cats_dogs_dataset/val.txt",
"pipeline": [
{
"type": "LoadImageFromFile"
},
{
"type": "Resize",
"size": [
256,
-1
],
"backend": "pillow"
},
{
"type": "CenterCrop",
"crop_size": 224
},
{
"type": "Normalize",
"mean": [
124.508,
116.05,
106.438
],
"std": [
58.577,
57.31,
57.437
],
"to_rgb": true
},
{
"type": "ImageToTensor",
"keys": [
"img"
]
},
{
"type": "Collect",
"keys": [
"img"
]
}
],
"classes": "data/cats_dogs_dataset/classes.txt"
}
This issue will be closed as it is inactive, feel free to re-open it if necessary.