ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Diffusion -- "NotImplementedError: Some torch function is incompatible because of its complcated inputs"

Open qq110146 opened this issue 2 years ago • 6 comments

🐛 Describe the bug

Hi, I tried to train stable diffusion v1 with colossal-AI, used 'train_colossalai_teyvat.yaml', but got "NotImplementedError: Some torch function is incompatible because of its complcated inputs". How should I solve this problem?

BTY, if I want to finetune from the "stable-diffusion-v1-4", where should I add this configuration?

System: ubuntu 20.04 gpu:3060-12G python 3.9

/home/lucienfang/anaconda3/envs/ldm4/bin/python3.9 /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py --logdir /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/ -t -b /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/configs/Teyvat/1.yaml Global seed set to 23 {'accelerator': 'gpu', 'devices': 1, 'log_gpu_memory': 'all', 'max_epochs': 2, 'precision': 16, 'auto_select_gpus': False, 'strategy': {'target': 'strategies.ColossalAIStrategy', 'params': {'use_chunk': True, 'enable_distributed_storage': True, 'placement_policy': 'cuda', 'force_outputs_fp32': True}}, 'log_every_n_steps': 2, 'logger': True, 'default_root_dir': '/tmp/diff_log/'} Running on GPU Using FP16 = True A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton.language' LatentDiffusion: Running in v-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loggers/tensorboard.py:123: UserWarning: You set TensorBoardLogger(log_graph=True) but tensorboard is not available. rank_zero_warn("You set TensorBoardLogger(log_graph=True) but tensorboard is not available.") Using strategy: strategies.ColossalAIStrategy Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/fused_optim/build.ninja... Building extension module fused_optim... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_optim... Time to load fused_optim op: 0.48834681510925293 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 0.5072433948516846 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/multihead_attention/build.ninja... Building extension module multihead_attention... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module multihead_attention... Time to load multihead_attention op: 0.503164529800415 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/scaled_upper_triang_masked_softmax/build.ninja... Building extension module scaled_upper_triang_masked_softmax... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax... Time to load scaled_upper_triang_masked_softmax op: 0.4033949375152588 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/moe/build.ninja... Building extension module moe... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module moe... Time to load moe op: 0.4086589813232422 seconds please install Colossal-AI from https://www.colossalai.org/download or from source Monitoring val/loss_simple_ema as checkpoint metric. Merged modelckpt-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': '/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3}} /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:562: LightningDeprecationWarning: The Trainer argument auto_select_gpus has been deprecated in v1.9.0 and will be removed in v1.10.0. Please use the function pytorch_lightning.accelerators.find_usable_cuda_devices instead. rank_zero_deprecation( GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 114.79it/s] WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1061.85it/s] /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:108: PossibleUserWarning: You defined a validation_step but have no val_dataloader. Skipping val loop. rank_zero_warn(

Data

train, Dataset, 234 accumulate_grad_batches = 1 Setting learning rate to 1.00e-04 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 1 (batchsize) * 1.00e-04 (base_lr) WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1628.22it/s] Missing logger folder: /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/diff_tb WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1462.96it/s] [01/05/23 11:28:07] INFO colossalai - ProcessGroup - INFO:
/home/lucienfang/data1/work/model/diffusion/Colossa lAI/colossalai/tensor/process_group.py:24 get
INFO colossalai - ProcessGroup - INFO: NCCL initialize
ProcessGroup on [0]
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up LambdaLR scheduler... searching chunk configuration is completed in 0.15 s. used number: 825.80 MB, wasted number: 0.80 MB total wasted percentage is 0.10% Project config model: base_learning_rate: 0.0001 target: ldm.models.diffusion.ddpm.LatentDiffusion params: parameterization: v linear_start: 0.00085 linear_end: 0.012 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image cond_stage_key: txt image_size: 64 channels: 4 cond_stage_trainable: false conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: false scheduler_config: target: ldm.lr_scheduler.LambdaLinearScheduler params: warm_up_steps: - 1 cycle_lengths: - 10000000000000 f_start: - 1.0e-06 f_max: - 0.0001 f_min: - 1.0e-10 unet_config: target: ldm.modules.diffusionmodules.openaimodel.UNetModel params: use_checkpoint: true use_fp16: true image_size: 32 in_channels: 4 out_channels: 4 model_channels: 320 attention_resolutions: - 4 - 2 - 1 num_res_blocks: 2 channel_mult: - 1 - 2 - 4 - 4 num_head_channels: 64 use_spatial_transformer: true use_linear_in_transformer: true transformer_depth: 1 context_dim: 1024 legacy: false first_stage_config: target: ldm.models.autoencoder.AutoencoderKL params: embed_dim: 4 monitor: val/rec_loss ddconfig: double_z: true z_channels: 4 resolution: 256 in_channels: 3 out_ch: 3 ch: 128 ch_mult: - 1 - 2 - 4 - 4 num_res_blocks: 2 attn_resolutions: [] dropout: 0.0 lossconfig: target: torch.nn.Identity cond_stage_config: target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder params: freeze: true layer: penultimate use_fp16: true data: target: main.DataModuleFromConfig params: batch_size: 1 num_workers: 4 train: target: ldm.data.teyvat.hf_dataset params: path: Fazzie/Teyvat image_transforms: - target: torchvision.transforms.Resize params: size: 512 - target: torchvision.transforms.RandomCrop params: size: 512 - target: torchvision.transforms.RandomHorizontalFlip

Lightning config trainer: accelerator: gpu devices: 1 log_gpu_memory: all max_epochs: 2 precision: 16 auto_select_gpus: false strategy: target: strategies.ColossalAIStrategy params: use_chunk: true enable_distributed_storage: true placement_policy: cuda force_outputs_fp32: true log_every_n_steps: 2 logger: true default_root_dir: /tmp/diff_log/ logger_config: wandb: target: loggers.WandbLogger params: name: nowname save_dir: /tmp/diff_log/ offline: opt.debug id: nowname

Epoch 0: 0%| | 0/234 [00:00<?, ?it/s] Summoning checkpoint.

Traceback (most recent call last): File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py", line 804, in trainer.fit(model, data) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 602, in fit call._call_and_handle_interrupt( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 88, in launch return function(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 644, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1097, in _run results = self._run_stage() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1176, in _run_stage self._run_train() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run_train self.fit_loop.run() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 213, in advance batch_output = self.batch_loop.run(kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance outputs = self.optimizer_loop.run(optimizers, kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 202, in advance result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position]) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 249, in _run_optimization self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 370, in _optimizer_step self.trainer._call_lightning_module_hook( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1341, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1672, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 412, in optimizer_step return self.precision_plugin.optimizer_step( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/colossalai.py", line 74, in optimizer_step closure_result = closure() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 149, in call self._result = self.closure(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 135, in closure step_output = self._step_fn() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 419, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values()) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1479, in _call_strategy_hook output = fn(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 351, in training_step return self.model(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/nn/parallel/data_parallel.py", line 274, in forward outputs = self.module(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 98, in forward output = self._forward_module.training_step(*inputs, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 486, in training_step loss, loss_dict = self.shared_step(batch) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 922, in shared_step x, c = self.get_input(batch, self.first_stage_key) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 859, in get_input encoder_posterior = self.encode_first_stage(x) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 919, in encode_first_stage return self.first_stage_model.encode(x) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/autoencoder.py", line 87, in encode h = self.encoder(x) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/modules/diffusionmodules/model.py", line 528, in forward hs = [self.conv_in(x)] File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward return F.conv2d(input, weight, bias, self.stride, File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/colo_parameter.py", line 85, in torch_function new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values()) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/param_op_hook.py", line 85, in pre_op grad_args, rear_args = _get_grad_args(*args) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/param_op_hook.py", line 153, in _get_grad_args raise NotImplementedError("Some torch function is incompatible because of its complcated inputs.") NotImplementedError: Some torch function is incompatible because of its complcated inputs.

Process finished with exit code 1

Environment

packages in environment at /home/lucienfang/anaconda3/envs/ldm4:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge aiohttp 3.8.3 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi altair 4.2.0 pypi_0 pypi antlr4-python3-runtime 4.9.3 pypi_0 pypi async-timeout 4.0.2 pypi_0 pypi attrs 22.2.0 pypi_0 pypi bcrypt 4.0.1 pypi_0 pypi beautifulsoup4 4.11.1 pypi_0 pypi blas 2.116 openblas conda-forge blas-devel 3.9.0 16_linux64_openblas conda-forge blinker 1.5 pypi_0 pypi braceexpand 0.1.7 pypi_0 pypi brotlipy 0.7.0 py39h27cfd23_1003 defaults bs4 0.0.1 pypi_0 pypi bzip2 1.0.8 h7b6447c_0 defaults ca-certificates 2022.10.11 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cachetools 5.2.0 pypi_0 pypi certifi 2022.12.7 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cffi 1.15.1 py39h74dc2b5_0 defaults cfgv 3.3.1 pypi_0 pypi charset-normalizer 2.0.4 pyhd3eb1b0_0 defaults click 8.1.3 pypi_0 pypi cmake 3.25.0 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi colossalai 0.1.12+torch1.12cu11.3 pypi_0 pypi commonmark 0.9.1 pypi_0 pypi contexttimer 0.3.3 pypi_0 pypi cryptography 38.0.1 py39h9ce1e76_0 defaults datasets 2.8.0 pypi_0 pypi decorator 5.1.1 pypi_0 pypi dill 0.3.6 pypi_0 pypi distlib 0.3.6 pypi_0 pypi einops 0.3.0 pypi_0 pypi entrypoints 0.4 pypi_0 pypi fabric 2.7.1 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.9.0 pyhd8ed1ab_0 conda-forge flatbuffers 22.12.6 pypi_0 pypi flit-core 3.6.0 pyhd3eb1b0_0 defaults freetype 2.12.1 h4a9f257_0 defaults frozenlist 1.3.3 pypi_0 pypi fsspec 2022.11.0 pypi_0 pypi ftfy 6.1.1 pypi_0 pypi giflib 5.2.1 h7b6447c_0 defaults gitdb 4.0.10 pypi_0 pypi gitpython 3.1.30 pypi_0 pypi gmp 6.2.1 h295c915_3 defaults gnutls 3.6.15 he1e5248_0 defaults huggingface-hub 0.11.1 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi identify 2.5.11 pypi_0 pypi idna 3.4 py39h06a4308_0 defaults importlib-metadata 5.2.0 pypi_0 pypi intel-openmp 2021.4.0 h06a4308_3561 defaults invisible-watermark 0.1.5 pypi_0 pypi invoke 1.7.3 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi jpeg 9e h7f8727e_0 defaults jsonschema 4.17.3 pypi_0 pypi kornia 0.6.0 pypi_0 pypi lame 3.100 h7b6447c_0 defaults lcms2 2.12 h3be6417_0 defaults ld_impl_linux-64 2.38 h1181459_1 defaults lerc 3.0 h295c915_0 defaults libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libdeflate 1.8 h7f8727e_5 defaults libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libiconv 1.16 h7f8727e_2 defaults libidn2 2.3.2 h7f8727e_0 defaults liblapack 3.9.0 16_linux64_openblas conda-forge liblapacke 3.9.0 16_linux64_openblas conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libpng 1.6.37 hbc83047_0 defaults libprotobuf 3.21.12 h3eb15da_0 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libtasn1 4.16.0 h27cfd23_0 defaults libtiff 4.4.0 hecacb30_2 defaults libunistring 0.9.10 h27cfd23_0 defaults libuuid 2.32.1 h7f98852_1000 conda-forge libwebp 1.2.4 h11a3e52_0 defaults libwebp-base 1.2.4 h5eee18b_0 defaults libzlib 1.2.13 h166bdaf_4 conda-forge lightning-utilities 0.5.0 pypi_0 pypi lit 15.0.6 pypi_0 pypi llvm-openmp 15.0.6 he0ac6c6_0 conda-forge lz4-c 1.9.4 h6a678d5_0 defaults markupsafe 2.1.1 pypi_0 pypi mkl 2022.2.1 h84fe81f_16997 conda-forge mkl-service 2.4.0 py39hb699420_0 conda-forge mkl_fft 1.3.1 py39h051f8f4_4 conda-forge mkl_random 1.2.2 py39h8b66066_1 conda-forge mpmath 1.2.1 pypi_0 pypi multidict 6.0.4 pypi_0 pypi multiprocess 0.70.14 pypi_0 pypi mypy-extensions 0.4.3 pypi_0 pypi ncurses 6.3 h5eee18b_3 defaults nettle 3.7.3 hbbd107a_1 defaults ninja 1.11.1 pypi_0 pypi nodeenv 1.7.0 pypi_0 pypi numpy 1.23.1 py39hf838250_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main numpy-base 1.23.1 py39h1e6e340_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main omegaconf 2.3.0 pypi_0 pypi onnx 1.13.0 pypi_0 pypi onnxruntime 1.13.1 pypi_0 pypi open-clip-torch 2.0.2 pypi_0 pypi openblas 0.3.21 pthreads_h320a7e8_3 conda-forge opencv-python 4.7.0.68 pypi_0 pypi openh264 2.1.1 h4ff587b_0 defaults openssl 3.0.7 h0b41bf4_1 conda-forge packaging 22.0 pypi_0 pypi pandas 1.5.2 pypi_0 pypi paramiko 2.12.0 pypi_0 pypi pathlib2 2.3.7.post1 pypi_0 pypi pillow 9.3.0 py39hace64e9_1 defaults pip 20.3.3 py39h06a4308_0 defaults pip-search 0.0.12 pypi_0 pypi platformdirs 2.6.2 pypi_0 pypi pre-commit 2.21.0 pypi_0 pypi prefetch-generator 1.0.3 pypi_0 pypi protobuf 3.20.1 pypi_0 pypi psutil 5.9.4 pypi_0 pypi pyarrow 10.0.1 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 defaults pydeck 0.8.0 pypi_0 pypi pydeprecate 0.3.2 pypi_0 pypi pygments 2.13.0 pypi_0 pypi pympler 1.0.1 pypi_0 pypi pynacl 1.5.0 pypi_0 pypi pyopenssl 22.0.0 pyhd3eb1b0_0 defaults pyre-extensions 0.0.23 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi pysocks 1.7.1 py39h06a4308_0 defaults python 3.9.12 h2660328_1_cpython conda-forge python-dateutil 2.8.2 pypi_0 pypi python_abi 3.9 3_cp39 conda-forge pytorch-lightning 1.9.0.dev0 pypi_0 pypi pytorch-mutex 1.0 cuda pytorch pytz 2022.7 pypi_0 pypi pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi pywavelets 1.4.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.2 h5eee18b_0 defaults regex 2022.10.31 pypi_0 pypi requests 2.28.1 py39h06a4308_0 defaults responses 0.18.0 pypi_0 pypi rich 13.0.0 pypi_0 pypi semver 2.13.0 pypi_0 pypi setuptools 65.5.0 py39h06a4308_0 defaults six 1.16.0 pyhd3eb1b0_1 defaults sleef 3.5.1 h9b69904_2 conda-forge smmap 5.0.0 pypi_0 pypi soupsieve 2.3.2.post1 pypi_0 pypi sqlite 3.40.0 h5082296_0 defaults streamlit 1.12.1 pypi_0 pypi streamlit-drawable-canvas 0.8.0 pypi_0 pypi sympy 1.11.1 pypi_0 pypi tbb 2021.7.0 h924138e_0 conda-forge tensorboardx 2.5.1 pypi_0 pypi tk 8.6.12 h1ccaba5_0 defaults tokenizers 0.12.1 pypi_0 pypi toml 0.10.2 pypi_0 pypi toolz 0.12.0 pypi_0 pypi torch 1.12.1+cu113 pypi_0 pypi torchmetrics 0.7.0 pypi_0 pypi torchvision 0.13.1+cu113 pypi_0 pypi tornado 6.2 pypi_0 pypi tqdm 4.64.1 pypi_0 pypi transformers 4.19.2 pypi_0 pypi typing-inspect 0.8.0 pypi_0 pypi typing_extensions 4.4.0 py39h06a4308_0 defaults tzdata 2022.7 pypi_0 pypi tzlocal 4.2 pypi_0 pypi urllib3 1.26.13 py39h06a4308_0 defaults validators 0.20.0 pypi_0 pypi virtualenv 20.17.1 pypi_0 pypi watchdog 2.2.0 pypi_0 pypi wcwidth 0.2.5 pypi_0 pypi webdataset 0.2.5 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 defaults xformers 0.0.15+e163309.d20230101 dev_0 xxhash 3.2.0 pypi_0 pypi xz 5.2.8 h5eee18b_0 defaults yarl 1.8.2 pypi_0 pypi zipp 3.11.0 pypi_0 pypi zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 ha4553b6_0 defaults

qq110146 avatar Jan 05 '23 04:01 qq110146

I suggest you to ues our docker to avoid package problem

Fazziekey avatar Jan 05 '23 07:01 Fazziekey

🐛 Describe the bug

Hi, I tried to train stable diffusion v1 with colossal-AI, used 'train_colossalai_teyvat.yaml', but got "NotImplementedError: Some torch function is incompatible because of its complcated inputs". How should I solve this problem?

BTY, if I want to finetune from the "stable-diffusion-v1-4", where should I add this configuration?

System: ubuntu 20.04 gpu:3060-12G python 3.9

/home/lucienfang/anaconda3/envs/ldm4/bin/python3.9 /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py --logdir /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/ -t -b /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/configs/Teyvat/1.yaml Global seed set to 23 {'accelerator': 'gpu', 'devices': 1, 'log_gpu_memory': 'all', 'max_epochs': 2, 'precision': 16, 'auto_select_gpus': False, 'strategy': {'target': 'strategies.ColossalAIStrategy', 'params': {'use_chunk': True, 'enable_distributed_storage': True, 'placement_policy': 'cuda', 'force_outputs_fp32': True}}, 'log_every_n_steps': 2, 'logger': True, 'default_root_dir': '/tmp/diff_log/'} Running on GPU Using FP16 = True A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton.language' LatentDiffusion: Running in v-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loggers/tensorboard.py:123: UserWarning: You set TensorBoardLogger(log_graph=True) but tensorboard is not available. rank_zero_warn("You set TensorBoardLogger(log_graph=True) but tensorboard is not available.") Using strategy: strategies.ColossalAIStrategy Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/fused_optim/build.ninja... Building extension module fused_optim... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_optim... Time to load fused_optim op: 0.48834681510925293 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 0.5072433948516846 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/multihead_attention/build.ninja... Building extension module multihead_attention... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module multihead_attention... Time to load multihead_attention op: 0.503164529800415 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/scaled_upper_triang_masked_softmax/build.ninja... Building extension module scaled_upper_triang_masked_softmax... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax... Time to load scaled_upper_triang_masked_softmax op: 0.4033949375152588 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/moe/build.ninja... Building extension module moe... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module moe... Time to load moe op: 0.4086589813232422 seconds please install Colossal-AI from https://www.colossalai.org/download or from source Monitoring val/loss_simple_ema as checkpoint metric. Merged modelckpt-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': '/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3}} /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:562: LightningDeprecationWarning: The Trainer argument auto_select_gpus has been deprecated in v1.9.0 and will be removed in v1.10.0. Please use the function pytorch_lightning.accelerators.find_usable_cuda_devices instead. rank_zero_deprecation( GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 114.79it/s] WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1061.85it/s] /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:108: PossibleUserWarning: You defined a validation_step but have no val_dataloader. Skipping val loop. rank_zero_warn(

Data

train, Dataset, 234 accumulate_grad_batches = 1 Setting learning rate to 1.00e-04 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 1 (batchsize) * 1.00e-04 (base_lr) WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1628.22it/s] Missing logger folder: /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/diff_tb WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1462.96it/s] [01/05/23 11:28:07] INFO colossalai - ProcessGroup - INFO: /home/lucienfang/data1/work/model/diffusion/Colossa lAI/colossalai/tensor/process_group.py:24 get INFO colossalai - ProcessGroup - INFO: NCCL initialize ProcessGroup on [0] Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up LambdaLR scheduler... searching chunk configuration is completed in 0.15 s. used number: 825.80 MB, wasted number: 0.80 MB total wasted percentage is 0.10% Project config model: base_learning_rate: 0.0001 target: ldm.models.diffusion.ddpm.LatentDiffusion params: parameterization: v linear_start: 0.00085 linear_end: 0.012 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image cond_stage_key: txt image_size: 64 channels: 4 cond_stage_trainable: false conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: false scheduler_config: target: ldm.lr_scheduler.LambdaLinearScheduler params: warm_up_steps: - 1 cycle_lengths: - 10000000000000 f_start: - 1.0e-06 f_max: - 0.0001 f_min: - 1.0e-10 unet_config: target: ldm.modules.diffusionmodules.openaimodel.UNetModel params: use_checkpoint: true use_fp16: true image_size: 32 in_channels: 4 out_channels: 4 model_channels: 320 attention_resolutions: - 4 - 2 - 1 num_res_blocks: 2 channel_mult: - 1 - 2 - 4 - 4 num_head_channels: 64 use_spatial_transformer: true use_linear_in_transformer: true transformer_depth: 1 context_dim: 1024 legacy: false first_stage_config: target: ldm.models.autoencoder.AutoencoderKL params: embed_dim: 4 monitor: val/rec_loss ddconfig: double_z: true z_channels: 4 resolution: 256 in_channels: 3 out_ch: 3 ch: 128 ch_mult: - 1 - 2 - 4 - 4 num_res_blocks: 2 attn_resolutions: [] dropout: 0.0 lossconfig: target: torch.nn.Identity cond_stage_config: target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder params: freeze: true layer: penultimate use_fp16: true data: target: main.DataModuleFromConfig params: batch_size: 1 num_workers: 4 train: target: ldm.data.teyvat.hf_dataset params: path: Fazzie/Teyvat image_transforms: - target: torchvision.transforms.Resize params: size: 512 - target: torchvision.transforms.RandomCrop params: size: 512 - target: torchvision.transforms.RandomHorizontalFlip

Lightning config trainer: accelerator: gpu devices: 1 log_gpu_memory: all max_epochs: 2 precision: 16 auto_select_gpus: false strategy: target: strategies.ColossalAIStrategy params: use_chunk: true enable_distributed_storage: true placement_policy: cuda force_outputs_fp32: true log_every_n_steps: 2 logger: true default_root_dir: /tmp/diff_log/ logger_config: wandb: target: loggers.WandbLogger params: name: nowname save_dir: /tmp/diff_log/ offline: opt.debug id: nowname

Epoch 0: 0%| | 0/234 [00:00<?, ?it/s] Summoning checkpoint.

Traceback (most recent call last): File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py", line 804, in trainer.fit(model, data) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 602, in fit call._call_and_handle_interrupt( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 88, in launch return function(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 644, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1097, in _run results = self._run_stage() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1176, in _run_stage self._run_train() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run_train self.fit_loop.run() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 213, in advance batch_output = self.batch_loop.run(kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance outputs = self.optimizer_loop.run(optimizers, kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 202, in advance result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position]) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 249, in _run_optimization self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 370, in _optimizer_step self.trainer._call_lightning_module_hook( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1341, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1672, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 412, in optimizer_step return self.precision_plugin.optimizer_step( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/colossalai.py", line 74, in optimizer_step closure_result = closure() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 149, in call self._result = self.closure(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 135, in closure step_output = self._step_fn() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 419, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values()) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1479, in _call_strategy_hook output = fn(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 351, in training_step return self.model(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/nn/parallel/data_parallel.py", line 274, in forward outputs = self.module(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 98, in forward output = self._forward_module.training_step(*inputs, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 486, in training_step loss, loss_dict = self.shared_step(batch) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 922, in shared_step x, c = self.get_input(batch, self.first_stage_key) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 859, in get_input encoder_posterior = self.encode_first_stage(x) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 919, in encode_first_stage return self.first_stage_model.encode(x) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/autoencoder.py", line 87, in encode h = self.encoder(x) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/modules/diffusionmodules/model.py", line 528, in forward hs = [self.conv_in(x)] File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward return F.conv2d(input, weight, bias, self.stride, File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/colo_parameter.py", line 85, in torch_function new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values()) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/param_op_hook.py", line 85, in pre_op grad_args, rear_args = _get_grad_args(*args) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/param_op_hook.py", line 153, in _get_grad_args raise NotImplementedError("Some torch function is incompatible because of its complcated inputs.") NotImplementedError: Some torch function is incompatible because of its complcated inputs.

Process finished with exit code 1

Environment

packages in environment at /home/lucienfang/anaconda3/envs/ldm4:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge aiohttp 3.8.3 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi altair 4.2.0 pypi_0 pypi antlr4-python3-runtime 4.9.3 pypi_0 pypi async-timeout 4.0.2 pypi_0 pypi attrs 22.2.0 pypi_0 pypi bcrypt 4.0.1 pypi_0 pypi beautifulsoup4 4.11.1 pypi_0 pypi blas 2.116 openblas conda-forge blas-devel 3.9.0 16_linux64_openblas conda-forge blinker 1.5 pypi_0 pypi braceexpand 0.1.7 pypi_0 pypi brotlipy 0.7.0 py39h27cfd23_1003 defaults bs4 0.0.1 pypi_0 pypi bzip2 1.0.8 h7b6447c_0 defaults ca-certificates 2022.10.11 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cachetools 5.2.0 pypi_0 pypi certifi 2022.12.7 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cffi 1.15.1 py39h74dc2b5_0 defaults cfgv 3.3.1 pypi_0 pypi charset-normalizer 2.0.4 pyhd3eb1b0_0 defaults click 8.1.3 pypi_0 pypi cmake 3.25.0 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi colossalai 0.1.12+torch1.12cu11.3 pypi_0 pypi commonmark 0.9.1 pypi_0 pypi contexttimer 0.3.3 pypi_0 pypi cryptography 38.0.1 py39h9ce1e76_0 defaults datasets 2.8.0 pypi_0 pypi decorator 5.1.1 pypi_0 pypi dill 0.3.6 pypi_0 pypi distlib 0.3.6 pypi_0 pypi einops 0.3.0 pypi_0 pypi entrypoints 0.4 pypi_0 pypi fabric 2.7.1 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.9.0 pyhd8ed1ab_0 conda-forge flatbuffers 22.12.6 pypi_0 pypi flit-core 3.6.0 pyhd3eb1b0_0 defaults freetype 2.12.1 h4a9f257_0 defaults frozenlist 1.3.3 pypi_0 pypi fsspec 2022.11.0 pypi_0 pypi ftfy 6.1.1 pypi_0 pypi giflib 5.2.1 h7b6447c_0 defaults gitdb 4.0.10 pypi_0 pypi gitpython 3.1.30 pypi_0 pypi gmp 6.2.1 h295c915_3 defaults gnutls 3.6.15 he1e5248_0 defaults huggingface-hub 0.11.1 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi identify 2.5.11 pypi_0 pypi idna 3.4 py39h06a4308_0 defaults importlib-metadata 5.2.0 pypi_0 pypi intel-openmp 2021.4.0 h06a4308_3561 defaults invisible-watermark 0.1.5 pypi_0 pypi invoke 1.7.3 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi jpeg 9e h7f8727e_0 defaults jsonschema 4.17.3 pypi_0 pypi kornia 0.6.0 pypi_0 pypi lame 3.100 h7b6447c_0 defaults lcms2 2.12 h3be6417_0 defaults ld_impl_linux-64 2.38 h1181459_1 defaults lerc 3.0 h295c915_0 defaults libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libdeflate 1.8 h7f8727e_5 defaults libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libiconv 1.16 h7f8727e_2 defaults libidn2 2.3.2 h7f8727e_0 defaults liblapack 3.9.0 16_linux64_openblas conda-forge liblapacke 3.9.0 16_linux64_openblas conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libpng 1.6.37 hbc83047_0 defaults libprotobuf 3.21.12 h3eb15da_0 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libtasn1 4.16.0 h27cfd23_0 defaults libtiff 4.4.0 hecacb30_2 defaults libunistring 0.9.10 h27cfd23_0 defaults libuuid 2.32.1 h7f98852_1000 conda-forge libwebp 1.2.4 h11a3e52_0 defaults libwebp-base 1.2.4 h5eee18b_0 defaults libzlib 1.2.13 h166bdaf_4 conda-forge lightning-utilities 0.5.0 pypi_0 pypi lit 15.0.6 pypi_0 pypi llvm-openmp 15.0.6 he0ac6c6_0 conda-forge lz4-c 1.9.4 h6a678d5_0 defaults markupsafe 2.1.1 pypi_0 pypi mkl 2022.2.1 h84fe81f_16997 conda-forge mkl-service 2.4.0 py39hb699420_0 conda-forge mkl_fft 1.3.1 py39h051f8f4_4 conda-forge mkl_random 1.2.2 py39h8b66066_1 conda-forge mpmath 1.2.1 pypi_0 pypi multidict 6.0.4 pypi_0 pypi multiprocess 0.70.14 pypi_0 pypi mypy-extensions 0.4.3 pypi_0 pypi ncurses 6.3 h5eee18b_3 defaults nettle 3.7.3 hbbd107a_1 defaults ninja 1.11.1 pypi_0 pypi nodeenv 1.7.0 pypi_0 pypi numpy 1.23.1 py39hf838250_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main numpy-base 1.23.1 py39h1e6e340_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main omegaconf 2.3.0 pypi_0 pypi onnx 1.13.0 pypi_0 pypi onnxruntime 1.13.1 pypi_0 pypi open-clip-torch 2.0.2 pypi_0 pypi openblas 0.3.21 pthreads_h320a7e8_3 conda-forge opencv-python 4.7.0.68 pypi_0 pypi openh264 2.1.1 h4ff587b_0 defaults openssl 3.0.7 h0b41bf4_1 conda-forge packaging 22.0 pypi_0 pypi pandas 1.5.2 pypi_0 pypi paramiko 2.12.0 pypi_0 pypi pathlib2 2.3.7.post1 pypi_0 pypi pillow 9.3.0 py39hace64e9_1 defaults pip 20.3.3 py39h06a4308_0 defaults pip-search 0.0.12 pypi_0 pypi platformdirs 2.6.2 pypi_0 pypi pre-commit 2.21.0 pypi_0 pypi prefetch-generator 1.0.3 pypi_0 pypi protobuf 3.20.1 pypi_0 pypi psutil 5.9.4 pypi_0 pypi pyarrow 10.0.1 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 defaults pydeck 0.8.0 pypi_0 pypi pydeprecate 0.3.2 pypi_0 pypi pygments 2.13.0 pypi_0 pypi pympler 1.0.1 pypi_0 pypi pynacl 1.5.0 pypi_0 pypi pyopenssl 22.0.0 pyhd3eb1b0_0 defaults pyre-extensions 0.0.23 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi pysocks 1.7.1 py39h06a4308_0 defaults python 3.9.12 h2660328_1_cpython conda-forge python-dateutil 2.8.2 pypi_0 pypi python_abi 3.9 3_cp39 conda-forge pytorch-lightning 1.9.0.dev0 pypi_0 pypi pytorch-mutex 1.0 cuda pytorch pytz 2022.7 pypi_0 pypi pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi pywavelets 1.4.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.2 h5eee18b_0 defaults regex 2022.10.31 pypi_0 pypi requests 2.28.1 py39h06a4308_0 defaults responses 0.18.0 pypi_0 pypi rich 13.0.0 pypi_0 pypi semver 2.13.0 pypi_0 pypi setuptools 65.5.0 py39h06a4308_0 defaults six 1.16.0 pyhd3eb1b0_1 defaults sleef 3.5.1 h9b69904_2 conda-forge smmap 5.0.0 pypi_0 pypi soupsieve 2.3.2.post1 pypi_0 pypi sqlite 3.40.0 h5082296_0 defaults streamlit 1.12.1 pypi_0 pypi streamlit-drawable-canvas 0.8.0 pypi_0 pypi sympy 1.11.1 pypi_0 pypi tbb 2021.7.0 h924138e_0 conda-forge tensorboardx 2.5.1 pypi_0 pypi tk 8.6.12 h1ccaba5_0 defaults tokenizers 0.12.1 pypi_0 pypi toml 0.10.2 pypi_0 pypi toolz 0.12.0 pypi_0 pypi torch 1.12.1+cu113 pypi_0 pypi torchmetrics 0.7.0 pypi_0 pypi torchvision 0.13.1+cu113 pypi_0 pypi tornado 6.2 pypi_0 pypi tqdm 4.64.1 pypi_0 pypi transformers 4.19.2 pypi_0 pypi typing-inspect 0.8.0 pypi_0 pypi typing_extensions 4.4.0 py39h06a4308_0 defaults tzdata 2022.7 pypi_0 pypi tzlocal 4.2 pypi_0 pypi urllib3 1.26.13 py39h06a4308_0 defaults validators 0.20.0 pypi_0 pypi virtualenv 20.17.1 pypi_0 pypi watchdog 2.2.0 pypi_0 pypi wcwidth 0.2.5 pypi_0 pypi webdataset 0.2.5 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 defaults xformers 0.0.15+e163309.d20230101 dev_0 xxhash 3.2.0 pypi_0 pypi xz 5.2.8 h5eee18b_0 defaults yarl 1.8.2 pypi_0 pypi zipp 3.11.0 pypi_0 pypi zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 ha4553b6_0 defaults

Have you solve this question?

FrankieDong avatar Jan 05 '23 08:01 FrankieDong

I encounter the same issue. Any solution to this?

flymin avatar Jan 06 '23 02:01 flymin

Downgrade ColossalAI to pip install colossalai==0.1.12+torch1.12cu11.3 -f https://release.colossalai.org

tg-bomze avatar Jan 06 '23 06:01 tg-bomze

Since SD requires building from the source. I finally made it to run by:

  1. checkout colossalai to v0.1.12 and install from the source CUDA_EXT=1 pip install -v --no-cache-dir .
  2. checkout colossalai to main branch and run SD codes.

My environment is setup according to the dockerfile on main branch. Do not know whether it is necessary to go through these steps, but it works for me.

flymin avatar Jan 06 '23 08:01 flymin

Since SD requires building from the source. I finally made it to run by:

  1. checkout colossalai to v0.1.12 and install from the source CUDA_EXT=1 pip install -v --no-cache-dir .
  2. checkout colossalai to main branch and run SD codes.

My environment is setup according to the dockerfile on main branch. Do not know whether it is necessary to go through these steps, but it works for me.

sorry, the latest colossalai have some bug in tensor, v0.1.12 is a right version to run it, we are fixing the bug

Fazziekey avatar Jan 06 '23 08:01 Fazziekey

@qq110146 @flymin @flymin Hi all, sorry for the bug. I believe @1SAA has fixed it.

feifeibear avatar Jan 07 '23 00:01 feifeibear