ColossalAI
ColossalAI copied to clipboard
[BUG]: Diffusion -- "NotImplementedError: Some torch function is incompatible because of its complcated inputs"
🐛 Describe the bug
Hi, I tried to train stable diffusion v1 with colossal-AI, used 'train_colossalai_teyvat.yaml', but got "NotImplementedError: Some torch function is incompatible because of its complcated inputs". How should I solve this problem?
BTY, if I want to finetune from the "stable-diffusion-v1-4", where should I add this configuration?
System: ubuntu 20.04 gpu:3060-12G python 3.9
/home/lucienfang/anaconda3/envs/ldm4/bin/python3.9 /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py --logdir /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/ -t -b /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/configs/Teyvat/1.yaml
Global seed set to 23
{'accelerator': 'gpu', 'devices': 1, 'log_gpu_memory': 'all', 'max_epochs': 2, 'precision': 16, 'auto_select_gpus': False, 'strategy': {'target': 'strategies.ColossalAIStrategy', 'params': {'use_chunk': True, 'enable_distributed_storage': True, 'placement_policy': 'cuda', 'force_outputs_fp32': True}}, 'log_every_n_steps': 2, 'logger': True, 'default_root_dir': '/tmp/diff_log/'}
Running on GPU
Using FP16 = True
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton.language'
LatentDiffusion: Running in v-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loggers/tensorboard.py:123: UserWarning: You set TensorBoardLogger(log_graph=True)
but tensorboard
is not available.
rank_zero_warn("You set TensorBoardLogger(log_graph=True)
but tensorboard
is not available.")
Using strategy: strategies.ColossalAIStrategy
Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/fused_optim/build.ninja...
Building extension module fused_optim...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_optim...
Time to load fused_optim op: 0.48834681510925293 seconds
Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.5072433948516846 seconds
Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/multihead_attention/build.ninja...
Building extension module multihead_attention...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module multihead_attention...
Time to load multihead_attention op: 0.503164529800415 seconds
Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/scaled_upper_triang_masked_softmax/build.ninja...
Building extension module scaled_upper_triang_masked_softmax...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax...
Time to load scaled_upper_triang_masked_softmax op: 0.4033949375152588 seconds
Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/moe/build.ninja...
Building extension module moe...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module moe...
Time to load moe op: 0.4086589813232422 seconds
please install Colossal-AI from https://www.colossalai.org/download or from source
Monitoring val/loss_simple_ema as checkpoint metric.
Merged modelckpt-cfg:
{'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': '/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3}}
/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:562: LightningDeprecationWarning: The Trainer argument auto_select_gpus
has been deprecated in v1.9.0 and will be removed in v1.10.0. Please use the function pytorch_lightning.accelerators.find_usable_cuda_devices
instead.
rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
WARNING:datasets.builder:Using custom data configuration train
WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701)
100%|██████████| 1/1 [00:00<00:00, 114.79it/s]
WARNING:datasets.builder:Using custom data configuration train
WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701)
100%|██████████| 1/1 [00:00<00:00, 1061.85it/s]
/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:108: PossibleUserWarning: You defined a validation_step
but have no val_dataloader
. Skipping val loop.
rank_zero_warn(
Data
train, Dataset, 234
accumulate_grad_batches = 1
Setting learning rate to 1.00e-04 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 1 (batchsize) * 1.00e-04 (base_lr)
WARNING:datasets.builder:Using custom data configuration train
WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701)
100%|██████████| 1/1 [00:00<00:00, 1628.22it/s]
Missing logger folder: /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/diff_tb
WARNING:datasets.builder:Using custom data configuration train
WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701)
100%|██████████| 1/1 [00:00<00:00, 1462.96it/s]
[01/05/23 11:28:07] INFO colossalai - ProcessGroup - INFO:
/home/lucienfang/data1/work/model/diffusion/Colossa
lAI/colossalai/tensor/process_group.py:24 get
INFO colossalai - ProcessGroup - INFO: NCCL initialize
ProcessGroup on [0]
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up LambdaLR scheduler...
searching chunk configuration is completed in 0.15 s.
used number: 825.80 MB, wasted number: 0.80 MB
total wasted percentage is 0.10%
Project config
model:
base_learning_rate: 0.0001
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
parameterization: v
linear_start: 0.00085
linear_end: 0.012
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: txt
image_size: 64
channels: 4
cond_stage_trainable: false
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: false
scheduler_config:
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps:
- 1
cycle_lengths:
- 10000000000000
f_start:
- 1.0e-06
f_max:
- 0.0001
f_min:
- 1.0e-10
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
use_checkpoint: true
use_fp16: true
image_size: 32
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
- 4
num_head_channels: 64
use_spatial_transformer: true
use_linear_in_transformer: true
transformer_depth: 1
context_dim: 1024
legacy: false
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params:
freeze: true
layer: penultimate
use_fp16: true
data:
target: main.DataModuleFromConfig
params:
batch_size: 1
num_workers: 4
train:
target: ldm.data.teyvat.hf_dataset
params:
path: Fazzie/Teyvat
image_transforms:
- target: torchvision.transforms.Resize
params:
size: 512
- target: torchvision.transforms.RandomCrop
params:
size: 512
- target: torchvision.transforms.RandomHorizontalFlip
Lightning config trainer: accelerator: gpu devices: 1 log_gpu_memory: all max_epochs: 2 precision: 16 auto_select_gpus: false strategy: target: strategies.ColossalAIStrategy params: use_chunk: true enable_distributed_storage: true placement_policy: cuda force_outputs_fp32: true log_every_n_steps: 2 logger: true default_root_dir: /tmp/diff_log/ logger_config: wandb: target: loggers.WandbLogger params: name: nowname save_dir: /tmp/diff_log/ offline: opt.debug id: nowname
Epoch 0: 0%| | 0/234 [00:00<?, ?it/s] Summoning checkpoint.
Traceback (most recent call last):
File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py", line 804, in
Process finished with exit code 1
Environment
packages in environment at /home/lucienfang/anaconda3/envs/ldm4:
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
aiohttp 3.8.3 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
altair 4.2.0 pypi_0 pypi
antlr4-python3-runtime 4.9.3 pypi_0 pypi
async-timeout 4.0.2 pypi_0 pypi
attrs 22.2.0 pypi_0 pypi
bcrypt 4.0.1 pypi_0 pypi
beautifulsoup4 4.11.1 pypi_0 pypi
blas 2.116 openblas conda-forge
blas-devel 3.9.0 16_linux64_openblas conda-forge
blinker 1.5 pypi_0 pypi
braceexpand 0.1.7 pypi_0 pypi
brotlipy 0.7.0 py39h27cfd23_1003 defaults
bs4 0.0.1 pypi_0 pypi
bzip2 1.0.8 h7b6447c_0 defaults
ca-certificates 2022.10.11 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cachetools 5.2.0 pypi_0 pypi
certifi 2022.12.7 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cffi 1.15.1 py39h74dc2b5_0 defaults
cfgv 3.3.1 pypi_0 pypi
charset-normalizer 2.0.4 pyhd3eb1b0_0 defaults
click 8.1.3 pypi_0 pypi
cmake 3.25.0 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
colossalai 0.1.12+torch1.12cu11.3 pypi_0 pypi
commonmark 0.9.1 pypi_0 pypi
contexttimer 0.3.3 pypi_0 pypi
cryptography 38.0.1 py39h9ce1e76_0 defaults
datasets 2.8.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
dill 0.3.6 pypi_0 pypi
distlib 0.3.6 pypi_0 pypi
einops 0.3.0 pypi_0 pypi
entrypoints 0.4 pypi_0 pypi
fabric 2.7.1 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.9.0 pyhd8ed1ab_0 conda-forge
flatbuffers 22.12.6 pypi_0 pypi
flit-core 3.6.0 pyhd3eb1b0_0 defaults
freetype 2.12.1 h4a9f257_0 defaults
frozenlist 1.3.3 pypi_0 pypi
fsspec 2022.11.0 pypi_0 pypi
ftfy 6.1.1 pypi_0 pypi
giflib 5.2.1 h7b6447c_0 defaults
gitdb 4.0.10 pypi_0 pypi
gitpython 3.1.30 pypi_0 pypi
gmp 6.2.1 h295c915_3 defaults
gnutls 3.6.15 he1e5248_0 defaults
huggingface-hub 0.11.1 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
identify 2.5.11 pypi_0 pypi
idna 3.4 py39h06a4308_0 defaults
importlib-metadata 5.2.0 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561 defaults
invisible-watermark 0.1.5 pypi_0 pypi
invoke 1.7.3 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jpeg 9e h7f8727e_0 defaults
jsonschema 4.17.3 pypi_0 pypi
kornia 0.6.0 pypi_0 pypi
lame 3.100 h7b6447c_0 defaults
lcms2 2.12 h3be6417_0 defaults
ld_impl_linux-64 2.38 h1181459_1 defaults
lerc 3.0 h295c915_0 defaults
libblas 3.9.0 16_linux64_openblas conda-forge
libcblas 3.9.0 16_linux64_openblas conda-forge
libdeflate 1.8 h7f8727e_5 defaults
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgfortran-ng 12.2.0 h69a702a_19 conda-forge
libgfortran5 12.2.0 h337968e_19 conda-forge
libgomp 12.2.0 h65d4601_19 conda-forge
libiconv 1.16 h7f8727e_2 defaults
libidn2 2.3.2 h7f8727e_0 defaults
liblapack 3.9.0 16_linux64_openblas conda-forge
liblapacke 3.9.0 16_linux64_openblas conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge
libpng 1.6.37 hbc83047_0 defaults
libprotobuf 3.21.12 h3eb15da_0 conda-forge
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libtasn1 4.16.0 h27cfd23_0 defaults
libtiff 4.4.0 hecacb30_2 defaults
libunistring 0.9.10 h27cfd23_0 defaults
libuuid 2.32.1 h7f98852_1000 conda-forge
libwebp 1.2.4 h11a3e52_0 defaults
libwebp-base 1.2.4 h5eee18b_0 defaults
libzlib 1.2.13 h166bdaf_4 conda-forge
lightning-utilities 0.5.0 pypi_0 pypi
lit 15.0.6 pypi_0 pypi
llvm-openmp 15.0.6 he0ac6c6_0 conda-forge
lz4-c 1.9.4 h6a678d5_0 defaults
markupsafe 2.1.1 pypi_0 pypi
mkl 2022.2.1 h84fe81f_16997 conda-forge
mkl-service 2.4.0 py39hb699420_0 conda-forge
mkl_fft 1.3.1 py39h051f8f4_4 conda-forge
mkl_random 1.2.2 py39h8b66066_1 conda-forge
mpmath 1.2.1 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
multiprocess 0.70.14 pypi_0 pypi
mypy-extensions 0.4.3 pypi_0 pypi
ncurses 6.3 h5eee18b_3 defaults
nettle 3.7.3 hbbd107a_1 defaults
ninja 1.11.1 pypi_0 pypi
nodeenv 1.7.0 pypi_0 pypi
numpy 1.23.1 py39hf838250_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy-base 1.23.1 py39h1e6e340_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
omegaconf 2.3.0 pypi_0 pypi
onnx 1.13.0 pypi_0 pypi
onnxruntime 1.13.1 pypi_0 pypi
open-clip-torch 2.0.2 pypi_0 pypi
openblas 0.3.21 pthreads_h320a7e8_3 conda-forge
opencv-python 4.7.0.68 pypi_0 pypi
openh264 2.1.1 h4ff587b_0 defaults
openssl 3.0.7 h0b41bf4_1 conda-forge
packaging 22.0 pypi_0 pypi
pandas 1.5.2 pypi_0 pypi
paramiko 2.12.0 pypi_0 pypi
pathlib2 2.3.7.post1 pypi_0 pypi
pillow 9.3.0 py39hace64e9_1 defaults
pip 20.3.3 py39h06a4308_0 defaults
pip-search 0.0.12 pypi_0 pypi
platformdirs 2.6.2 pypi_0 pypi
pre-commit 2.21.0 pypi_0 pypi
prefetch-generator 1.0.3 pypi_0 pypi
protobuf 3.20.1 pypi_0 pypi
psutil 5.9.4 pypi_0 pypi
pyarrow 10.0.1 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0 defaults
pydeck 0.8.0 pypi_0 pypi
pydeprecate 0.3.2 pypi_0 pypi
pygments 2.13.0 pypi_0 pypi
pympler 1.0.1 pypi_0 pypi
pynacl 1.5.0 pypi_0 pypi
pyopenssl 22.0.0 pyhd3eb1b0_0 defaults
pyre-extensions 0.0.23 pypi_0 pypi
pyrsistent 0.19.3 pypi_0 pypi
pysocks 1.7.1 py39h06a4308_0 defaults
python 3.9.12 h2660328_1_cpython conda-forge
python-dateutil 2.8.2 pypi_0 pypi
python_abi 3.9 3_cp39 conda-forge
pytorch-lightning 1.9.0.dev0 pypi_0 pypi
pytorch-mutex 1.0 cuda pytorch
pytz 2022.7 pypi_0 pypi
pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi
pywavelets 1.4.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
regex 2022.10.31 pypi_0 pypi
requests 2.28.1 py39h06a4308_0 defaults
responses 0.18.0 pypi_0 pypi
rich 13.0.0 pypi_0 pypi
semver 2.13.0 pypi_0 pypi
setuptools 65.5.0 py39h06a4308_0 defaults
six 1.16.0 pyhd3eb1b0_1 defaults
sleef 3.5.1 h9b69904_2 conda-forge
smmap 5.0.0 pypi_0 pypi
soupsieve 2.3.2.post1 pypi_0 pypi
sqlite 3.40.0 h5082296_0 defaults
streamlit 1.12.1 pypi_0 pypi
streamlit-drawable-canvas 0.8.0 pypi_0 pypi
sympy 1.11.1 pypi_0 pypi
tbb 2021.7.0 h924138e_0 conda-forge
tensorboardx 2.5.1 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
tokenizers 0.12.1 pypi_0 pypi
toml 0.10.2 pypi_0 pypi
toolz 0.12.0 pypi_0 pypi
torch 1.12.1+cu113 pypi_0 pypi
torchmetrics 0.7.0 pypi_0 pypi
torchvision 0.13.1+cu113 pypi_0 pypi
tornado 6.2 pypi_0 pypi
tqdm 4.64.1 pypi_0 pypi
transformers 4.19.2 pypi_0 pypi
typing-inspect 0.8.0 pypi_0 pypi
typing_extensions 4.4.0 py39h06a4308_0 defaults
tzdata 2022.7 pypi_0 pypi
tzlocal 4.2 pypi_0 pypi
urllib3 1.26.13 py39h06a4308_0 defaults
validators 0.20.0 pypi_0 pypi
virtualenv 20.17.1 pypi_0 pypi
watchdog 2.2.0 pypi_0 pypi
wcwidth 0.2.5 pypi_0 pypi
webdataset 0.2.5 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0 defaults
xformers 0.0.15+e163309.d20230101 dev_0
I suggest you to ues our docker to avoid package problem
🐛 Describe the bug
Hi, I tried to train stable diffusion v1 with colossal-AI, used 'train_colossalai_teyvat.yaml', but got "NotImplementedError: Some torch function is incompatible because of its complcated inputs". How should I solve this problem?
BTY, if I want to finetune from the "stable-diffusion-v1-4", where should I add this configuration?
System: ubuntu 20.04 gpu:3060-12G python 3.9
/home/lucienfang/anaconda3/envs/ldm4/bin/python3.9 /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py --logdir /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/ -t -b /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/configs/Teyvat/1.yaml Global seed set to 23 {'accelerator': 'gpu', 'devices': 1, 'log_gpu_memory': 'all', 'max_epochs': 2, 'precision': 16, 'auto_select_gpus': False, 'strategy': {'target': 'strategies.ColossalAIStrategy', 'params': {'use_chunk': True, 'enable_distributed_storage': True, 'placement_policy': 'cuda', 'force_outputs_fp32': True}}, 'log_every_n_steps': 2, 'logger': True, 'default_root_dir': '/tmp/diff_log/'} Running on GPU Using FP16 = True A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton.language' LatentDiffusion: Running in v-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loggers/tensorboard.py:123: UserWarning: You set
TensorBoardLogger(log_graph=True)
buttensorboard
is not available. rank_zero_warn("You setTensorBoardLogger(log_graph=True)
buttensorboard
is not available.") Using strategy: strategies.ColossalAIStrategy Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/fused_optim/build.ninja... Building extension module fused_optim... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_optim... Time to load fused_optim op: 0.48834681510925293 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 0.5072433948516846 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/multihead_attention/build.ninja... Building extension module multihead_attention... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module multihead_attention... Time to load multihead_attention op: 0.503164529800415 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/scaled_upper_triang_masked_softmax/build.ninja... Building extension module scaled_upper_triang_masked_softmax... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax... Time to load scaled_upper_triang_masked_softmax op: 0.4033949375152588 seconds Using /home/lucienfang/.cache/torch_extensions/py39_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lucienfang/.cache/torch_extensions/py39_cu113/moe/build.ninja... Building extension module moe... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module moe... Time to load moe op: 0.4086589813232422 seconds please install Colossal-AI from https://www.colossalai.org/download or from source Monitoring val/loss_simple_ema as checkpoint metric. Merged modelckpt-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': '/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3}} /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:562: LightningDeprecationWarning: The Trainer argumentauto_select_gpus
has been deprecated in v1.9.0 and will be removed in v1.10.0. Please use the functionpytorch_lightning.accelerators.find_usable_cuda_devices
instead. rank_zero_deprecation( GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 114.79it/s] WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1061.85it/s] /home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:108: PossibleUserWarning: You defined avalidation_step
but have noval_dataloader
. Skipping val loop. rank_zero_warn(Data
train, Dataset, 234 accumulate_grad_batches = 1 Setting learning rate to 1.00e-04 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 1 (batchsize) * 1.00e-04 (base_lr) WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1628.22it/s] Missing logger folder: /home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/train/2023-01-05T11-26-08_1/diff_tb WARNING:datasets.builder:Using custom data configuration train WARNING:datasets.builder:Found cached dataset teyvat (/home/lucienfang/.cache/huggingface/datasets/Fazzie___teyvat/train/0.0.0/62e3cc07a1a94bcb7c0d02f703087023dd935272664b2da5525b893724f24701) 100%|██████████| 1/1 [00:00<00:00, 1462.96it/s] [01/05/23 11:28:07] INFO colossalai - ProcessGroup - INFO: /home/lucienfang/data1/work/model/diffusion/Colossa lAI/colossalai/tensor/process_group.py:24 get INFO colossalai - ProcessGroup - INFO: NCCL initialize ProcessGroup on [0] Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up LambdaLR scheduler... searching chunk configuration is completed in 0.15 s. used number: 825.80 MB, wasted number: 0.80 MB total wasted percentage is 0.10% Project config model: base_learning_rate: 0.0001 target: ldm.models.diffusion.ddpm.LatentDiffusion params: parameterization: v linear_start: 0.00085 linear_end: 0.012 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image cond_stage_key: txt image_size: 64 channels: 4 cond_stage_trainable: false conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: false scheduler_config: target: ldm.lr_scheduler.LambdaLinearScheduler params: warm_up_steps: - 1 cycle_lengths: - 10000000000000 f_start: - 1.0e-06 f_max: - 0.0001 f_min: - 1.0e-10 unet_config: target: ldm.modules.diffusionmodules.openaimodel.UNetModel params: use_checkpoint: true use_fp16: true image_size: 32 in_channels: 4 out_channels: 4 model_channels: 320 attention_resolutions: - 4 - 2 - 1 num_res_blocks: 2 channel_mult: - 1 - 2 - 4 - 4 num_head_channels: 64 use_spatial_transformer: true use_linear_in_transformer: true transformer_depth: 1 context_dim: 1024 legacy: false first_stage_config: target: ldm.models.autoencoder.AutoencoderKL params: embed_dim: 4 monitor: val/rec_loss ddconfig: double_z: true z_channels: 4 resolution: 256 in_channels: 3 out_ch: 3 ch: 128 ch_mult: - 1 - 2 - 4 - 4 num_res_blocks: 2 attn_resolutions: [] dropout: 0.0 lossconfig: target: torch.nn.Identity cond_stage_config: target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder params: freeze: true layer: penultimate use_fp16: true data: target: main.DataModuleFromConfig params: batch_size: 1 num_workers: 4 train: target: ldm.data.teyvat.hf_dataset params: path: Fazzie/Teyvat image_transforms: - target: torchvision.transforms.Resize params: size: 512 - target: torchvision.transforms.RandomCrop params: size: 512 - target: torchvision.transforms.RandomHorizontalFlip
Lightning config trainer: accelerator: gpu devices: 1 log_gpu_memory: all max_epochs: 2 precision: 16 auto_select_gpus: false strategy: target: strategies.ColossalAIStrategy params: use_chunk: true enable_distributed_storage: true placement_policy: cuda force_outputs_fp32: true log_every_n_steps: 2 logger: true default_root_dir: /tmp/diff_log/ logger_config: wandb: target: loggers.WandbLogger params: name: nowname save_dir: /tmp/diff_log/ offline: opt.debug id: nowname
Epoch 0: 0%| | 0/234 [00:00<?, ?it/s] Summoning checkpoint.
Traceback (most recent call last): File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/main.py", line 804, in trainer.fit(model, data) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 602, in fit call._call_and_handle_interrupt( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 88, in launch return function(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 644, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1097, in _run results = self._run_stage() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1176, in _run_stage self._run_train() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run_train self.fit_loop.run() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 213, in advance batch_output = self.batch_loop.run(kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance outputs = self.optimizer_loop.run(optimizers, kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 202, in advance result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position]) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 249, in _run_optimization self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 370, in _optimizer_step self.trainer._call_lightning_module_hook( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1341, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1672, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 412, in optimizer_step return self.precision_plugin.optimizer_step( File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/colossalai.py", line 74, in optimizer_step closure_result = closure() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 149, in call self._result = self.closure(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 135, in closure step_output = self._step_fn() File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 419, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values()) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1479, in _call_strategy_hook output = fn(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 351, in training_step return self.model(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/nn/parallel/data_parallel.py", line 274, in forward outputs = self.module(*args, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 98, in forward output = self._forward_module.training_step(*inputs, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 486, in training_step loss, loss_dict = self.shared_step(batch) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 922, in shared_step x, c = self.get_input(batch, self.first_stage_key) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 859, in get_input encoder_posterior = self.encode_first_stage(x) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/diffusion/ddpm.py", line 919, in encode_first_stage return self.first_stage_model.encode(x) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/models/autoencoder.py", line 87, in encode h = self.encoder(x) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/examples/images/diffusion/ldm/modules/diffusionmodules/model.py", line 528, in forward hs = [self.conv_in(x)] File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/lucienfang/anaconda3/envs/ldm4/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward return F.conv2d(input, weight, bias, self.stride, File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/colo_parameter.py", line 85, in torch_function new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values()) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/param_op_hook.py", line 85, in pre_op grad_args, rear_args = _get_grad_args(*args) File "/home/lucienfang/data1/work/model/diffusion/ColossalAI/colossalai/tensor/param_op_hook.py", line 153, in _get_grad_args raise NotImplementedError("Some torch function is incompatible because of its complcated inputs.") NotImplementedError: Some torch function is incompatible because of its complcated inputs.
Process finished with exit code 1
Environment
packages in environment at /home/lucienfang/anaconda3/envs/ldm4:
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge aiohttp 3.8.3 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi altair 4.2.0 pypi_0 pypi antlr4-python3-runtime 4.9.3 pypi_0 pypi async-timeout 4.0.2 pypi_0 pypi attrs 22.2.0 pypi_0 pypi bcrypt 4.0.1 pypi_0 pypi beautifulsoup4 4.11.1 pypi_0 pypi blas 2.116 openblas conda-forge blas-devel 3.9.0 16_linux64_openblas conda-forge blinker 1.5 pypi_0 pypi braceexpand 0.1.7 pypi_0 pypi brotlipy 0.7.0 py39h27cfd23_1003 defaults bs4 0.0.1 pypi_0 pypi bzip2 1.0.8 h7b6447c_0 defaults ca-certificates 2022.10.11 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cachetools 5.2.0 pypi_0 pypi certifi 2022.12.7 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cffi 1.15.1 py39h74dc2b5_0 defaults cfgv 3.3.1 pypi_0 pypi charset-normalizer 2.0.4 pyhd3eb1b0_0 defaults click 8.1.3 pypi_0 pypi cmake 3.25.0 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi colossalai 0.1.12+torch1.12cu11.3 pypi_0 pypi commonmark 0.9.1 pypi_0 pypi contexttimer 0.3.3 pypi_0 pypi cryptography 38.0.1 py39h9ce1e76_0 defaults datasets 2.8.0 pypi_0 pypi decorator 5.1.1 pypi_0 pypi dill 0.3.6 pypi_0 pypi distlib 0.3.6 pypi_0 pypi einops 0.3.0 pypi_0 pypi entrypoints 0.4 pypi_0 pypi fabric 2.7.1 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.9.0 pyhd8ed1ab_0 conda-forge flatbuffers 22.12.6 pypi_0 pypi flit-core 3.6.0 pyhd3eb1b0_0 defaults freetype 2.12.1 h4a9f257_0 defaults frozenlist 1.3.3 pypi_0 pypi fsspec 2022.11.0 pypi_0 pypi ftfy 6.1.1 pypi_0 pypi giflib 5.2.1 h7b6447c_0 defaults gitdb 4.0.10 pypi_0 pypi gitpython 3.1.30 pypi_0 pypi gmp 6.2.1 h295c915_3 defaults gnutls 3.6.15 he1e5248_0 defaults huggingface-hub 0.11.1 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi identify 2.5.11 pypi_0 pypi idna 3.4 py39h06a4308_0 defaults importlib-metadata 5.2.0 pypi_0 pypi intel-openmp 2021.4.0 h06a4308_3561 defaults invisible-watermark 0.1.5 pypi_0 pypi invoke 1.7.3 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi jpeg 9e h7f8727e_0 defaults jsonschema 4.17.3 pypi_0 pypi kornia 0.6.0 pypi_0 pypi lame 3.100 h7b6447c_0 defaults lcms2 2.12 h3be6417_0 defaults ld_impl_linux-64 2.38 h1181459_1 defaults lerc 3.0 h295c915_0 defaults libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libdeflate 1.8 h7f8727e_5 defaults libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libiconv 1.16 h7f8727e_2 defaults libidn2 2.3.2 h7f8727e_0 defaults liblapack 3.9.0 16_linux64_openblas conda-forge liblapacke 3.9.0 16_linux64_openblas conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libpng 1.6.37 hbc83047_0 defaults libprotobuf 3.21.12 h3eb15da_0 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libtasn1 4.16.0 h27cfd23_0 defaults libtiff 4.4.0 hecacb30_2 defaults libunistring 0.9.10 h27cfd23_0 defaults libuuid 2.32.1 h7f98852_1000 conda-forge libwebp 1.2.4 h11a3e52_0 defaults libwebp-base 1.2.4 h5eee18b_0 defaults libzlib 1.2.13 h166bdaf_4 conda-forge lightning-utilities 0.5.0 pypi_0 pypi lit 15.0.6 pypi_0 pypi llvm-openmp 15.0.6 he0ac6c6_0 conda-forge lz4-c 1.9.4 h6a678d5_0 defaults markupsafe 2.1.1 pypi_0 pypi mkl 2022.2.1 h84fe81f_16997 conda-forge mkl-service 2.4.0 py39hb699420_0 conda-forge mkl_fft 1.3.1 py39h051f8f4_4 conda-forge mkl_random 1.2.2 py39h8b66066_1 conda-forge mpmath 1.2.1 pypi_0 pypi multidict 6.0.4 pypi_0 pypi multiprocess 0.70.14 pypi_0 pypi mypy-extensions 0.4.3 pypi_0 pypi ncurses 6.3 h5eee18b_3 defaults nettle 3.7.3 hbbd107a_1 defaults ninja 1.11.1 pypi_0 pypi nodeenv 1.7.0 pypi_0 pypi numpy 1.23.1 py39hf838250_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main numpy-base 1.23.1 py39h1e6e340_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main omegaconf 2.3.0 pypi_0 pypi onnx 1.13.0 pypi_0 pypi onnxruntime 1.13.1 pypi_0 pypi open-clip-torch 2.0.2 pypi_0 pypi openblas 0.3.21 pthreads_h320a7e8_3 conda-forge opencv-python 4.7.0.68 pypi_0 pypi openh264 2.1.1 h4ff587b_0 defaults openssl 3.0.7 h0b41bf4_1 conda-forge packaging 22.0 pypi_0 pypi pandas 1.5.2 pypi_0 pypi paramiko 2.12.0 pypi_0 pypi pathlib2 2.3.7.post1 pypi_0 pypi pillow 9.3.0 py39hace64e9_1 defaults pip 20.3.3 py39h06a4308_0 defaults pip-search 0.0.12 pypi_0 pypi platformdirs 2.6.2 pypi_0 pypi pre-commit 2.21.0 pypi_0 pypi prefetch-generator 1.0.3 pypi_0 pypi protobuf 3.20.1 pypi_0 pypi psutil 5.9.4 pypi_0 pypi pyarrow 10.0.1 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 defaults pydeck 0.8.0 pypi_0 pypi pydeprecate 0.3.2 pypi_0 pypi pygments 2.13.0 pypi_0 pypi pympler 1.0.1 pypi_0 pypi pynacl 1.5.0 pypi_0 pypi pyopenssl 22.0.0 pyhd3eb1b0_0 defaults pyre-extensions 0.0.23 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi pysocks 1.7.1 py39h06a4308_0 defaults python 3.9.12 h2660328_1_cpython conda-forge python-dateutil 2.8.2 pypi_0 pypi python_abi 3.9 3_cp39 conda-forge pytorch-lightning 1.9.0.dev0 pypi_0 pypi pytorch-mutex 1.0 cuda pytorch pytz 2022.7 pypi_0 pypi pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi pywavelets 1.4.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.2 h5eee18b_0 defaults regex 2022.10.31 pypi_0 pypi requests 2.28.1 py39h06a4308_0 defaults responses 0.18.0 pypi_0 pypi rich 13.0.0 pypi_0 pypi semver 2.13.0 pypi_0 pypi setuptools 65.5.0 py39h06a4308_0 defaults six 1.16.0 pyhd3eb1b0_1 defaults sleef 3.5.1 h9b69904_2 conda-forge smmap 5.0.0 pypi_0 pypi soupsieve 2.3.2.post1 pypi_0 pypi sqlite 3.40.0 h5082296_0 defaults streamlit 1.12.1 pypi_0 pypi streamlit-drawable-canvas 0.8.0 pypi_0 pypi sympy 1.11.1 pypi_0 pypi tbb 2021.7.0 h924138e_0 conda-forge tensorboardx 2.5.1 pypi_0 pypi tk 8.6.12 h1ccaba5_0 defaults tokenizers 0.12.1 pypi_0 pypi toml 0.10.2 pypi_0 pypi toolz 0.12.0 pypi_0 pypi torch 1.12.1+cu113 pypi_0 pypi torchmetrics 0.7.0 pypi_0 pypi torchvision 0.13.1+cu113 pypi_0 pypi tornado 6.2 pypi_0 pypi tqdm 4.64.1 pypi_0 pypi transformers 4.19.2 pypi_0 pypi typing-inspect 0.8.0 pypi_0 pypi typing_extensions 4.4.0 py39h06a4308_0 defaults tzdata 2022.7 pypi_0 pypi tzlocal 4.2 pypi_0 pypi urllib3 1.26.13 py39h06a4308_0 defaults validators 0.20.0 pypi_0 pypi virtualenv 20.17.1 pypi_0 pypi watchdog 2.2.0 pypi_0 pypi wcwidth 0.2.5 pypi_0 pypi webdataset 0.2.5 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 defaults xformers 0.0.15+e163309.d20230101 dev_0 xxhash 3.2.0 pypi_0 pypi xz 5.2.8 h5eee18b_0 defaults yarl 1.8.2 pypi_0 pypi zipp 3.11.0 pypi_0 pypi zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 ha4553b6_0 defaults
Have you solve this question?
I encounter the same issue. Any solution to this?
Downgrade ColossalAI to
pip install colossalai==0.1.12+torch1.12cu11.3 -f https://release.colossalai.org
Since SD requires building from the source. I finally made it to run by:
- checkout colossalai to
v0.1.12
and install from the sourceCUDA_EXT=1 pip install -v --no-cache-dir .
- checkout colossalai to
main
branch and run SD codes.
My environment is setup according to the dockerfile on main
branch. Do not know whether it is necessary to go through these steps, but it works for me.
Since SD requires building from the source. I finally made it to run by:
- checkout colossalai to
v0.1.12
and install from the sourceCUDA_EXT=1 pip install -v --no-cache-dir .
- checkout colossalai to
main
branch and run SD codes.My environment is setup according to the dockerfile on
main
branch. Do not know whether it is necessary to go through these steps, but it works for me.
sorry, the latest colossalai have some bug in tensor, v0.1.12 is a right version to run it, we are fixing the bug
@qq110146 @flymin @flymin Hi all, sorry for the bug. I believe @1SAA has fixed it.