ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Error when download from source

Open haofanwang opened this issue 2 years ago • 4 comments

🐛 Describe the bug

To Reproduce

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .

I have installed the corresponding version of Pytorch with CUDA 11.3, Python3.8. My environment works fine with pip install, and can import successfully. But I notice that dreambooth example is still under fast development, so it would be better to keep updated with the lastest version.

Expected behavior colossalai check -i should work.

Logs


root@qs-527-4961-master-0:/workspace/ColossalAI# colossalai check -i
Using /root/.cache/torch_extensions/py38_cu113 as PyTorch extensions root...
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/colossalai/kernel/__init__.py", line 4, in <module>
    from colossalai._C import fused_optim
ImportError: cannot import name 'fused_optim' from 'colossalai._C' (unknown location)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/colossalai", line 33, in <module>
    sys.exit(load_entry_point('colossalai==0.2.0', 'console_scripts', 'colossalai')())
  File "/opt/conda/bin/colossalai", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/opt/conda/lib/python3.8/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/conda/lib/python3.8/site-packages/colossalai/__init__.py", line 1, in <module>
    from .initialize import (
  File "/opt/conda/lib/python3.8/site-packages/colossalai/initialize.py", line 23, in <module>
    from colossalai.engine.schedule import NonPipelineSchedule, PipelineSchedule, InterleavedPipelineSchedule, get_tensor_shape
  File "/opt/conda/lib/python3.8/site-packages/colossalai/engine/__init__.py", line 1, in <module>
    from ._base_engine import Engine
  File "/opt/conda/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 10, in <module>
    from colossalai.gemini.ophooks import register_ophooks_recursively, BaseOpHook
  File "/opt/conda/lib/python3.8/site-packages/colossalai/gemini/__init__.py", line 1, in <module>
    from .chunk import ChunkManager, TensorInfo, TensorState, search_chunk_configuration
 File "/opt/conda/lib/python3.8/site-packages/colossalai/gemini/chunk/__init__.py", line 1, in <module>
    from .chunk import Chunk, ChunkFullError, TensorInfo, TensorState
  File "/opt/conda/lib/python3.8/site-packages/colossalai/gemini/chunk/chunk.py", line 9, in <module>
    from colossalai.utils import get_current_device
  File "/opt/conda/lib/python3.8/site-packages/colossalai/utils/__init__.py", line 3, in <module>
    from .checkpointing import load_checkpoint, save_checkpoint
  File "/opt/conda/lib/python3.8/site-packages/colossalai/utils/checkpointing.py", line 14, in <module>
    from .common import is_using_pp
  File "/opt/conda/lib/python3.8/site-packages/colossalai/utils/common.py", line 21, in <module>
    from colossalai.kernel import fused_optim
  File "/opt/conda/lib/python3.8/site-packages/colossalai/kernel/__init__.py", line 7, in <module>
    fused_optim = FusedOptimBuilder().load()
  File "/opt/conda/lib/python3.8/site-packages/colossalai/kernel/op_builder/builder.py", line 83, in load
    op_module = load(name=self.name,
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1385, in _jit_compile
    version = JIT_EXTENSION_VERSIONER.bump_version_if_changed(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/_cpp_extension_versioner.py", line 45, in bump_version_if_changed
    hash_value = hash_source_files(hash_value, source_files)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/_cpp_extension_versioner.py", line 15, in hash_source_files
    with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/lib/python3.8/site-packages/colossalai/kernel/colossalai/kernel/cuda_native/csrc/colossal_C_frontend.cpp'

Environment

No response

haofanwang avatar Jan 06 '23 07:01 haofanwang

run python setup.py clean and python setup.py install, to recompile cpu kernel

Fazziekey avatar Jan 06 '23 08:01 Fazziekey

It doesn't work. Error messages are the same as above.

haofanwang avatar Jan 06 '23 08:01 haofanwang

@haofanwang sorry, it is a bug

could you please install CAI using the following cmd.

CUDA_EXT=1 pip install .

feifeibear avatar Jan 06 '23 08:01 feifeibear

Thanks, @feifeibear. It looks fine now. I will report a new issue if I still meet problems, and I'm also glad to PR if I can fix it on my own.

截屏2023-01-06 17 26 36

haofanwang avatar Jan 06 '23 08:01 haofanwang

@haofanwang It has worked already. You can ignore those warnings. Anyway, we will fix them later.

feifeibear avatar Jan 07 '23 10:01 feifeibear