mmengine icon indicating copy to clipboard operation
mmengine copied to clipboard

[Bug] KeyError: 'Adafactor is already registered in optimizer at torch.optim'

Open yeshsurya opened this issue 1 year ago • 1 comments

Prerequisite

  • [X] I have searched Issues and Discussions but cannot get the expected help.
  • [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine).

Environment

OrderedDict([('sys.platform', 'linux'), ('Python', '3.10.15 | packaged by conda-forge | (main, Oct 16 2024, 01:24:24) [GCC 13.3.0]'), ('CUDA available', True), ('MUSA available', False), ('numpy_random_seed', np.uint32(2147483648)), ('GPU 0,1,2,3,4,5,6,7', 'NVIDIA A100-SXM4-80GB'), ('CUDA_HOME', '/usr/local/cuda'), ('NVCC', 'Cuda compilation tools, release 11.8, V11.8.89'), ('GCC', 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0'), ('PyTorch', '2.5.0'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.8\n - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n - CuDNN 90.1\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n'), ('TorchVision', '0.20.0'), ('OpenCV', '4.10.0'), ('MMEngine', '0.10.5')])

Reproduces the problem - code sample

from mmdet.apis import init_detector

Reproduces the problem - command or script

python -c "from mmdet.apis import init_detector"

Reproduces the problem - error message

root@2c29850f6eb1:/# python -c "from mmdet.apis import init_detector"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmdet/apis/__init__.py", line 2, in <module>
    from .det_inferencer import DetInferencer
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmdet/apis/det_inferencer.py", line 15, in <module>
    from mmengine.infer.infer import BaseInferencer, ModelType
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/infer/__init__.py", line 2, in <module>
    from .infer import BaseInferencer
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/infer/infer.py", line 25, in <module>
    from mmengine.runner.checkpoint import (_load_checkpoint,
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/runner/__init__.py", line 2, in <module>
    from ._flexible_runner import FlexibleRunner
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 14, in <module>
    from mmengine._strategy import BaseStrategy
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/_strategy/__init__.py", line 4, in <module>
    from .base import BaseStrategy
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/_strategy/base.py", line 19, in <module>
    from mmengine.model.wrappers import is_model_wrapper
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/model/__init__.py", line 6, in <module>
    from .base_model import BaseDataPreprocessor, BaseModel, ImgDataPreprocessor
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/model/base_model/__init__.py", line 2, in <module>
    from .base_model import BaseModel
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/model/base_model/base_model.py", line 9, in <module>
    from mmengine.optim import OptimWrapper
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/optim/__init__.py", line 2, in <module>
    from .optimizer import (OPTIM_WRAPPER_CONSTRUCTORS, OPTIMIZERS,
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/optim/optimizer/__init__.py", line 5, in <module>
    from .builder import (OPTIM_WRAPPER_CONSTRUCTORS, OPTIMIZERS,
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/optim/optimizer/builder.py", line 174, in <module>
    TRANSFORMERS_OPTIMIZERS = register_transformers_optimizers()
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/optim/optimizer/builder.py", line 169, in register_transformers_optimizers
    OPTIMIZERS.register_module(name='Adafactor', module=Adafactor)
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/registry/registry.py", line 661, in register_module
    self._register_module(module=module, module_name=name, force=force)
  File "/azureml-envs/model-evaluation/lib/python3.10/site-packages/mmengine/registry/registry.py", line 611, in _register_module
    raise KeyError(f'{name} is already registered in {self.name} '
KeyError: 'Adafactor is already registered in optimizer at torch.optim'

Additional information

There was same bug reported but closed by specifying downgrade of Pytorch. This would lead to not using the latest PyTorch. We need to fix this in mmengine. Downgrade probably works because PyTorch introduced the optimizer in forward versions.

yeshsurya avatar Nov 26 '24 11:11 yeshsurya

This bug was fixed in the November 5th commit:https://github.com/open-mmlab/mmengine/commit/2e0ab7a92220d2f0c725798047773495d589c548

BeiXianWei avatar Dec 28 '24 14:12 BeiXianWei