mmselfsup
mmselfsup copied to clipboard
Meeting error when trained MAE model by using the CIFAR
This is the config file:
>>>>>>>>>>>>>>>>>>>>> Start of Changed >>>>>>>>>>>>>>>>>>>>>>>>>
base = [ '../base/models/mae_vit-base-p16.py', # '../base/datasets/cifar.py', '../base/schedules/adamw_coslr-200e_in1k.py', '../base/default_runtime.py', ]
dataset settings
#data_root = 'data/cifar' #file_client_args = dict(backend='disk') data_source = 'CIFAR10' dataset_type = 'SingleViewDataset' img_norm_cfg = dict(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) train_pipeline = [ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), ] test_pipeline = []
prefetch
prefetch = False if not prefetch: train_pipeline.extend( [dict(type='ToTensor'), dict(type='Normalize', **img_norm_cfg)]) test_pipeline.extend( [dict(type='ToTensor'), dict(type='Normalize', **img_norm_cfg)])
dataset summary
data = dict( samples_per_gpu=128, workers_per_gpu=2, train=dict( type=dataset_type, data_source=dict( type=data_source, data_prefix='data/cifar', ), pipeline=train_pipeline, prefetch=prefetch), val=dict( type=dataset_type, data_source=dict( type=data_source, data_prefix='data/cifar', ), pipeline=test_pipeline, prefetch=prefetch), test=dict( type=dataset_type, data_source=dict( type=data_source, data_prefix='data/cifar', ), pipeline=test_pipeline, prefetch=prefetch)) #evaluation = dict(interval=10, topk=(1, 5))
dataset 8 x 512
train_dataloader = dict(batch_size=128, num_workers=8)
<<<<<<<<<<<<<<<<<<<<<< End of Changed <<<<<<<<<<<<<<<<<<<<<<<<<<<
optimizer wrapper
optimizer = dict( type='AdamW', lr=1.5e-4 * 4096 / 256, betas=(0.9, 0.95), weight_decay=0.05) optim_wrapper = dict( type='OptimWrapper', optimizer=optimizer, paramwise_cfg=dict( custom_keys={ 'ln': dict(decay_mult=0.0), 'bias': dict(decay_mult=0.0), 'pos_embed': dict(decay_mult=0.), 'mask_token': dict(decay_mult=0.), 'cls_token': dict(decay_mult=0.) }))
learning rate scheduler
param_scheduler = [ dict( type='LinearLR', start_factor=1e-4, by_epoch=True, begin=0, end=40, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=360, by_epoch=True, begin=40, end=400, convert_to_iter_based=True) ]
runtime settings
pre-train for 400 epochs
train_cfg = dict(max_epochs=3) default_hooks = dict( logger=dict(type='LoggerHook', interval=100), # only keeps the latest 3 checkpoints checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness
randomness = dict(seed=0, diff_rank_seed=True) resume = True
This is the log:
During handling of the above exception, another exception occurred: 2023/02/20 01:19:00 - mmengine - INFO -
System environment: sys.platform: linux Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0] CUDA available: True numpy_random_seed: 0 GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /data/apps/cuda/11.1 NVCC: Cuda compilation tools, release 11.1, V11.1.74 GCC: gcc (GCC) 7.3.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:
-
GCC 7.3
-
C++ Version: 201402
-
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
-
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
-
OpenMP 201511 (a.k.a. OpenMP 4.5)
-
LAPACK is enabled (usually provided by MKL)
-
NNPACK is enabled
-
CPU capability usage: AVX2
-
CUDA Runtime 11.1
-
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
-
CuDNN 8.0.5
-
Magma 2.5.2
-
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.0+cu111 OpenCV: 4.7.0 MMEngine: 0.5.0
Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 0 diff_rank_seed: True Distributed launcher: none Distributed training: False GPU number: 1
2023/02/20 01:19:00 - mmengine - INFO - Config: model = dict( type='MAE', data_preprocessor=dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict(type='MAEViT', arch='b', patch_size=16, mask_ratio=0.75), neck=dict( type='MAEPretrainDecoder', patch_size=16, in_chans=3, embed_dim=768, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, mlp_ratio=4.0), head=dict( type='MAEPretrainHead', norm_pix=True, patch_size=16, loss=dict(type='MAEReconstructionLoss')), init_cfg=[ dict(type='Xavier', distribution='uniform', layer='Linear'), dict(type='Constant', layer='LayerNorm', val=1.0, bias=0.0) ]) optimizer = dict(type='AdamW', lr=0.0024, betas=(0.9, 0.95), weight_decay=0.05) optim_wrapper = dict( type='OptimWrapper', optimizer=dict( type='AdamW', lr=0.0024, betas=(0.9, 0.95), weight_decay=0.05), paramwise_cfg=dict( custom_keys=dict( ln=dict(decay_mult=0.0), bias=dict(decay_mult=0.0), pos_embed=dict(decay_mult=0.0), mask_token=dict(decay_mult=0.0), cls_token=dict(decay_mult=0.0)))) param_scheduler = [ dict( type='LinearLR', start_factor=0.0001, by_epoch=True, begin=0, end=40, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=360, by_epoch=True, begin=40, end=400, convert_to_iter_based=True) ] train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=3) default_scope = 'mmselfsup' default_hooks = dict( runtime_info=dict(type='RuntimeInfoHook'), timer=dict(type='IterTimerHook'), logger=dict(type='LoggerHook', interval=100), param_scheduler=dict(type='ParamSchedulerHook'), checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3), sampler_seed=dict(type='DistSamplerSeedHook')) env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl')) log_processor = dict( window_size=10, custom_cfg=[dict(data_src='', method='mean', window_size='global')]) vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='SelfSupVisualizer', vis_backends=[dict(type='LocalVisBackend')], name='visualizer') log_level = 'INFO' load_from = None resume = True data_source = 'CIFAR10' dataset_type = 'SingleViewDataset' img_norm_cfg = dict(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) train_pipeline = [ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ] test_pipeline = [ dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ] prefetch = False data = dict( samples_per_gpu=128, workers_per_gpu=2, train=dict( type='SingleViewDataset', data_source=dict(type='CIFAR10', data_prefix='data/cifar'), pipeline=[ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ], prefetch=False), val=dict( type='SingleViewDataset', data_source=dict(type='CIFAR10', data_prefix='data/cifar'), pipeline=[ dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ], prefetch=False), test=dict( type='SingleViewDataset', data_source=dict(type='CIFAR10', data_prefix='data/cifar'), pipeline=[ dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ], prefetch=False)) train_dataloader = dict(batch_size=128, num_workers=8) randomness = dict(seed=0, diff_rank_seed=True) launcher = 'none' work_dir = 'work'
2023/02/20 01:19:00 - mmengine - WARNING - The "visualizer" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
2023/02/20 01:19:00 - mmengine - WARNING - The "vis_backend" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
2023/02/20 01:19:01 - mmengine - WARNING - The "model" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
2023/02/20 01:19:06 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
2023/02/20 01:19:06 - mmengine - WARNING - The "hook" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
2023/02/20 01:19:06 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook
before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
after_train_epoch:
(NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
before_val_epoch: (NORMAL ) IterTimerHook
before_val_iter: (NORMAL ) IterTimerHook
after_val_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
before_test_epoch: (NORMAL ) IterTimerHook
before_test_iter: (NORMAL ) IterTimerHook
after_test_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_run: (BELOW_NORMAL) LoggerHook
2023/02/20 01:19:07 - mmengine - WARNING - The "loop" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
Traceback (most recent call last): File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 43, in init super().init(runner, dataloader) File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/base_loop.py", line 26, in init self.dataloader = runner.build_dataloader( File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1331, in build_dataloader dataset_cfg = dataloader_cfg.pop('dataset') KeyError: 'dataset'
Traceback (most recent call last):
File "tools/train.py", line 99, in EpochBasedTrainLoop
in mmengine/runner/loops.py: 'dataset'"
It seems your config is from MMSelfSup 0.x version, but the log shows you use MMEngine to start your training job. Try to pull and checkout to MMSelfSup 1.x branch, and use the new config of MAE.
emm, thsat's a great idea. using MMSelfSup 1.x branch to train MAE on my own dataset(refer: https://mmselfsup.readthedocs.io/zh_CN/dev-1.x/user_guides/4_pretrain_custom_dataset.html), it has worked, but now i want to train MAE on CIFAR10, how to change the dataset part of config, i can't find an example from (refer: https://github.com/open-mmlab/mmselfsup/tree/1.x/configs/selfsup/base/datasets), which is all about ImageNet
You can refer to this doc. Before you pre-train your model on CIFAR10, you should refactor the folder of CIFAR10 to the style of ImageNet1K and also create a ImageNet1K-style annotation file. After that, you can make few changes to the config, but replace data_root
and ann_file
with your own settings.
Thanks~
Hi, Have you solved the problem? I also wan to use CIFAR in MMselfsup, Can you provide your CIFAR directory structure? Thanks!
emm, i didn't use the CIFAR10, just organized my own datasets into the style of ImageNet1K and also create a ImageNet1K-style annotation file. If you want to use the CIFAR, a good idea is to change the CIFAR to ImageNet1K style or mmcls.CustomDataset style (refer: https://mmselfsup.readthedocs.io/zh_CN/dev-1.x/user_guides/4_pretrain_custom_dataset.html).
@Jia-Baos Thank you for your reply! What is the specific format of ImageNet1K-style? I found the ImageNet1K dataset on google, but not the meta folder, and I only found this picture in mmselfsup doc, but I don't know the specific structure, for example, what should be under the meta folder? Now I prepared my own data set, but I don't know exactly what ImageNet1K-style is, would you mind giving me an example or documentation? Thank you very much. Your reply is very helpful to me!
I‘m very pleased that i could be help, you can refer this file 北京超算30区使用MMClassification训练花卉图片分类模型.pdf in https://github.com/Jia-Baos/OpenMM.
Thank you very much for your reply. It's works!!
I‘m very pleased that i could be help, you can refer this file 北京超算30区使用MMClassification训练花卉图片分类模型.pdf in https://github.com/Jia-Baos/OpenMM.
hello , i can not find such pdf file. also i faced some issue while training my model on my custom dataset.
i got such error:
KeyError: 'SelfSupVisualizer is not in the visualizer registry. Please check whether the value of
SelfSupVisualizer is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'
Emm, you can find this file in my repository openMM. As for the error, I have turned to optical flow estimation, so that i can't help you more......