mmcv icon indicating copy to clipboard operation
mmcv copied to clipboard

**KAGGLE** --- mmagic error - undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs

Open MasterHM-ml opened this issue 2 years ago • 7 comments

          cd mmcv && git checkout 2.x

Originally posted by @uniyushu in https://github.com/open-mmlab/mmcv/issues/2660#issuecomment-1467669827

cd mmcv && git checkout 2.x

I'm using mmcv=2.0.1, and still facing the same issue. I installed mmcv via mim. Here is how I installed it on Kaggle

!pip3 install -U openmim
!mim install 'mmcv>=2.0.0'
!mim install 'mmengine'

%cd /kaggle/working
!rm -rf mmagic
!git clone https://github.com/open-mmlab/mmagic.git
%cd mmagic
!pip3 install -e . -v

!python -c "import mmagic; print(mmagic.__version__)"

No error in installation.

But, I'm getting the error when calling !python3 tools/train.py "configs/edsr/edsr_x2c64b16_1xb16-300k_UCMerced.py" --auto-scale-lr Here is the stack trace cutted from last calls

after printing logs, it first shows some warnings

/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
  warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
  warnings.warn(f"file system plugins are not loaded: {e}")

and then an error

...
...
...
 /opt/conda/lib/python3.10/site-packages/mmcv/utils/ext_loader.py:13 in       │
│ load_ext                                                                     │
│                                                                              │
│   10 if torch.__version__ != 'parrots':                                      │
│   11 │                                                                       │
│   12 │   def load_ext(name, funcs):                                          │
│ ❱ 13 │   │   ext = importlib.import_module('mmcv.' + name)                   │
│   14 │   │   for fun in funcs:                                               │
│   15 │   │   │   assert hasattr(ext, fun), f'{fun} miss in module {name}'    │
│   16 │   │   return ext                                                      │
│                                                                              │
│ /opt/conda/lib/python3.10/importlib/__init__.py:126 in import_module         │
│                                                                              │
│   123 │   │   │   if character != '.':                                       │
│   124 │   │   │   │   break                                                  │
│   125 │   │   │   level += 1                                                 │
│ ❱ 126 │   return _bootstrap._gcd_import(name[level:], package, level)        │
│   127                                                                        │
│   128                                                                        │
│   129 _RELOADING = {}    
ImportError: 
/opt/conda/lib/python3.10/site-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.s
o: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs

Any solution to the problem or clue to debug will be highly helpful and appreciated. Thank you. Same code works fine in Colab.

MasterHM-ml avatar Aug 14 '23 12:08 MasterHM-ml

hi @MasterHM-ml , it seems that mmcv was not successfully installed. Can you reinstall mmcv again and check whether it is installed successfully? You may refer https://mmcv.readthedocs.io/en/latest/get_started/installation.html# to install mmcv

zengyh1900 avatar Aug 15 '23 02:08 zengyh1900

Hello, @zengyh1900 - thanks for the update. But I installed the mmcv according to the official documentation guidelines. Here is the gist to see a complete detailed stack trace.

MasterHM-ml avatar Aug 18 '23 01:08 MasterHM-ml

hi @zhouzaida I think the error comes from https://gist.github.com/MasterHM-ml/619dee045ce44c5184cd93cb833328b1#file-gistfile1-txt-L1120 , where the codes try to import ops from mmcv. Is it caused by installing the wrong version of mmcv in different platform? Do you have any ideas?

zengyh1900 avatar Aug 18 '23 07:08 zengyh1900

Any update?

MasterHM-ml avatar Aug 20 '23 08:08 MasterHM-ml

I am also facing the same issue!

tomarvimal avatar Aug 22 '23 08:08 tomarvimal

Try mmagic docker ? image

or maybe it cause by pytorch 2.x version try 1.x conda install pytorch=1.10

uniyushu avatar Aug 30 '23 09:08 uniyushu

For those who are still struggling to install and use mmcv. I tried the officially recommended approach (https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html#install-with-mim-recommended) as well as the instruction from this comment (https://github.com/open-mmlab/mmdetection/issues/10401#issuecomment-1627394117). They didn't work for me. However, I noticed that there is no error when running in a CPU-only regime on Kaggle. So, I suspected there might be conflicts with the latest CUDA (I had CUDA 12.1 in my environment). After I downgraded CUDA (downgraded by finding an old notebook with a pinned environment) to 11.3, everything started to work. Here is a notebook with the pinned environment (CUDA 11.3), where no errors appear in mmcv: https://www.kaggle.com/code/vadimshabashov/mmdetection-startup-on-kaggle?scriptVersionId=180583679

VadimShabashov avatar May 30 '24 10:05 VadimShabashov