ColossalAI [BUG]: ChunkManager.__init__() takes from 2 to 3 positional arguments but 5 were given

🐛 Describe the bug

Hi, I'm trying to finetune stable diffusion using the example script in the repo. The ChunkManager.init function is being passed the wrong args from the PyTorchLightning ColossalAI Strategy file.

Traceback (most recent call last):
  File "main.py", line 808, in <module>
    trainer.fit(model, data)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in fit
    call._call_and_handle_interrupt(
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 90, in launch
    return function(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 621, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1039, in _run
    self.strategy.setup(self)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/colossalai.py", line 333, in setup
    self.setup_precision_plugin()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/colossalai.py", line 275, in setup_precision_plugin
    chunk_manager = ChunkManager(
TypeError: __init__() takes from 2 to 3 positional arguments but 5 were given

Environment

CUDA: 11.8 PyTorch: 1.13.0 Built the ColossalAI package from source

Nov 14 '22 17:11 salmanshah1d

Hi @salmanshah1d,

I believe this issue is same as #1872.

Nov 15 '22 02:11 1SAA

Hi @1SAA, thanks so much for your response. #1872 mentions to install via pip install colossalai==0.1.10+torch1.11cu11.3 -f https://release.colossalai.org. Does my environment need to match Pytorch versions 1.11 and CUDA 11.3?

These versions are fairly old, so would ideally like to use the latest versions (which I think are Pytorch 1.13 and CUDA 11.8).

Do you have any advice for setting up / resolving those requirements? I currently use the NVIDIA NGC Docker images, but do you have any other suggestions?

Nov 17 '22 18:11 salmanshah1d

i have the same problem ,and my cuda version is 11.6

Nov 23 '22 08:11 kongfanjing

@1SAA i have the same problem ,and my cuda version is 11.6

Nov 24 '22 07:11 Johnson-yue

We have updated a lot. This issue was closed due to inactivity. Thanks.

Apr 14 '23 08:04 binmakeswell