mmpretrain icon indicating copy to clipboard operation
mmpretrain copied to clipboard

[Bug] Error when training model - TypeError: BaseDataset.__init__() got an unexpected keyword argument 'split'

Open TNodeCode opened this issue 1 year ago • 6 comments

Branch

main branch (mmpretrain version)

Describe the bug

I have tried to train a model on a custom dataset using the mmpretrain library.

First I cloned the repository, then I created a dataset folder with the following structure:

  • data -- custom_dataset --- train --- test --- val

Next I followed the documentation (https://mmpretrain.readthedocs.io/en/latest/user_guides/train.html) on how to train a classification model on a custom dataset.

I created a new configuration file:

configs/mobilenet_v2/mobilenet-v2_finetune.py

_base_ = [
    '../_base_/models/mobilenet_v2_1x.py',
    '../_base_/datasets/imagenet_bs32_pil_resize.py',
    '../_base_/schedules/imagenet_bs256_epochstep.py',
    '../_base_/default_runtime.py'
]


# model settings
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth',
            prefix='backbone',
        )),
    head=dict(num_classes=10),
)

# data settings
data_root = 'data/custom_dataset'
train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
    ))
val_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='val',
    ))
test_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='test',
    ))

# schedule settings
optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))
param_scheduler = dict(
    type='MultiStepLR', by_epoch=True, milestones=[15], gamma=0.1)

I then tried to train the model on my custom dataset with the command python ./tools/train.py ./configs/mobilenet_v2/mobilenet-v2_finetune.py

Then I get the following error:

C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\__init__.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ..\c10\cuda\CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
12/30 17:42:16 - mmengine - INFO -
------------------------------------------------------------
System environment:
    sys.platform: win32
    Python: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
    CUDA available: False
    numpy_random_seed: 1691281147
    MSVC: Microsoft (R) C/C++-Optimierungscompiler Version 19.26.28806 für x64
    GCC: n/a
    PyTorch: 2.0.1+cu117
    PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 193431937
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

    TorchVision: 0.15.2+cu117
    OpenCV: 4.7.0
    MMEngine: 0.10.2

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 1691281147
    deterministic: False
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

12/30 17:42:16 - mmengine - INFO - Config:
auto_scale_lr = dict(base_batch_size=256)
data_preprocessor = dict(
    mean=[
        123.675,
        116.28,
        103.53,
    ],
    num_classes=1000,
    std=[
        58.395,
        57.12,
        57.375,
    ],
    to_rgb=True)
data_root = 'data/custom_dataset'
dataset_type = 'ImageNet'
default_hooks = dict(
    checkpoint=dict(interval=1, type='CheckpointHook'),
    logger=dict(interval=100, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(enable=False, type='VisualizationHook'))
default_scope = 'mmpretrain'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
launcher = 'none'
load_from = None
log_level = 'INFO'
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            checkpoint=
            'https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth',
            prefix='backbone',
            type='Pretrained'),
        type='MobileNetV2',
        widen_factor=1.0),
    head=dict(
        in_channels=1280,
        loss=dict(loss_weight=1.0, type='CrossEntropyLoss'),
        num_classes=10,
        topk=(
            1,
            5,
        ),
        type='LinearClsHead'),
    neck=dict(type='GlobalAveragePooling'),
    type='ImageClassifier')
optim_wrapper = dict(
    optimizer=dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0001))
param_scheduler = dict(
    by_epoch=True,
    gamma=0.1,
    milestones=[
        15,
    ],
    step_size=1,
    type='MultiStepLR')
randomness = dict(deterministic=False, seed=None)
resume = False
test_cfg = dict()
test_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='test',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
            dict(crop_size=224, type='CenterCrop'),
            dict(type='PackInputs'),
        ],
        split='val',
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    topk=(
        1,
        5,
    ), type='Accuracy')
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
    dict(crop_size=224, type='CenterCrop'),
    dict(type='PackInputs'),
]
train_cfg = dict(by_epoch=True, max_epochs=300, val_interval=1)
train_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='train',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', scale=224, type='RandomResizedCrop'),
            dict(direction='horizontal', prob=0.5, type='RandomFlip'),
            dict(type='PackInputs'),
        ],
        split='train',
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(backend='pillow', scale=224, type='RandomResizedCrop'),
    dict(direction='horizontal', prob=0.5, type='RandomFlip'),
    dict(type='PackInputs'),
]
val_cfg = dict()
val_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='val',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
            dict(crop_size=224, type='CenterCrop'),
            dict(type='PackInputs'),
        ],
        split='val',
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    topk=(
        1,
        5,
    ), type='Accuracy')
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    type='UniversalVisualizer', vis_backends=[
        dict(type='LocalVisBackend'),
    ])
work_dir = './work_dirs\\mobilenet-v2_finetune'

12/30 17:42:21 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
12/30 17:42:21 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
 --------------------
before_train:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(NORMAL      ) DistSamplerSeedHook
 --------------------
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
 --------------------
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_train_epoch:
(NORMAL      ) IterTimerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_val:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
before_val_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_val_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_val_iter:
(NORMAL      ) IterTimerHook
(NORMAL      ) VisualizationHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_val:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
after_train:
(VERY_HIGH   ) RuntimeInfoHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_test:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
before_test_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_test_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_test_iter:
(NORMAL      ) IterTimerHook
(NORMAL      ) VisualizationHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
after_run:
(BELOW_NORMAL) LoggerHook
 --------------------
Traceback (most recent call last):
  File "C:\Users\tilof\PycharmProjects\DeepLearningProjects\OpenMMLab\mmpretrain\tools\train.py", line 162, in <module>
    main()
  File "C:\Users\tilof\PycharmProjects\DeepLearningProjects\OpenMMLab\mmpretrain\tools\train.py", line 158, in main
    runner.train()
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\runner.py", line 1728, in train
    self._train_loop = self.build_train_loop(
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\runner.py", line 1527, in build_train_loop
    loop = EpochBasedTrainLoop(
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\loops.py", line 44, in __init__
    super().__init__(runner, dataloader)
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\base_loop.py", line 26, in __init__
    self.dataloader = runner.build_dataloader(
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\runner.py", line 1370, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\registry\registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmpretrain\datasets\custom.py", line 207, in __init__
    super().__init__(
TypeError: BaseDataset.__init__() got an unexpected keyword argument 'split'

Environment

{'sys.platform': 'win32', 'Python': '3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 ' '64 bit (AMD64)]', 'CUDA available': False, 'numpy_random_seed': 2147483648, 'MSVC': 'Microsoft (R) C/C++-Optimierungscompiler Version 19.26.28806 für x64', 'GCC': 'n/a', 'PyTorch': '2.0.1+cu117', 'TorchVision': '0.15.2+cu117', 'OpenCV': '4.7.0', 'MMEngine': '0.10.2', 'MMCV': '2.1.0', 'MMPreTrain': '1.1.1+e95d9ac'}

Other information

No response

TNodeCode avatar Dec 30 '23 16:12 TNodeCode

I had the same problem following the guide How to Pretrain with Custom Dataset.

The problem is that the dataset you are overriding has a split argument (_base_/datasets/imagenet_bs32_pil_resize.py#L32) which doesn't work with the CustomDataset.

The solution I found was to copy all the arguments and add an extra _delete_=True (doc). Something like this (to repeat for other datasets):

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224, backend='pillow'),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
        pipeline=train_pipeline,
        _delete_=True,
    ))

leon-costa avatar Jan 01 '24 18:01 leon-costa

Hi, @leon-costa,

I'm trying but not working, Is there any way to fix the above problem?

Huy-Thai avatar Jan 08 '24 10:01 Huy-Thai

Hi everyone, any update? I am also having exact same problem with CustomDataset

smoothumut avatar Feb 02 '24 06:02 smoothumut

I have made it worked.

@leon-costa 's solution and the link he gave https://mmpretrain.readthedocs.io/en/latest/user_guides/config.html#ignore-some-fields-in-the-base-configs helped me better understand the problem.

In my case I have removed the

'../base/datasets/imagenet_bs32_pil_resize.py', from my config's base,

then applied required dict settings (of course without split) for dataset into my config. Then it worked. thanks all for guiding

smoothumut avatar Feb 02 '24 07:02 smoothumut

@TNodeCode Just remove split args of each dataloader config.

train_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='train',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', scale=224, type='RandomResizedCrop'),
            dict(direction='horizontal', prob=0.5, type='RandomFlip'),
            dict(type='PackInputs'),
        ],
        split='train', <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  remove (same as val_dataloader)
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))

split option is only used with datasets that have implemented the split feature, so if the split feature has not been specifically configured when using a custom dataset, it can be removed.

A prominent dataset that utilizes this feature is ImageNet.

gjustin40 avatar Feb 13 '24 09:02 gjustin40

An alternative solution is to subclass CustomDataset and just throw away the split arg:

from mmpretrain.registry import DATASETS
from mmpretrain.datasets.custom import CustomDataset

@DATASETS.register_module()
class CustomDataset2(CustomDataset):
    def __init__(self, split=None, **kwargs):
        super(CustomDataset2, self).__init__(**kwargs)

OpenByteDev avatar Jul 24 '24 13:07 OpenByteDev