ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: diffusion infer erroe

Open GxjGit opened this issue 2 years ago • 0 comments

🐛 Describe the bug

python scripts/txt2img.py --prompt "Teyvat, Name:Layla, Element: Cryo, Weapon:Sword, Region:Sumeru, Model type:Medium Female, Description:a woman in a blue outfit holding a sword" --plms --outdir output --config 2022-12-02T02-14-03-project.yaml --ckpt last.ckpt

I got the error as:

the code commit id is 6e51d296f07c0ad34d7f85cf9a70d4ceee15ede7 .

I update to : edf4cd46c5395899c795f43bdc3d4a8b16166531 And try again:

image

And I train to train again in order to get the new checkpoint, but occur a new error:

`oder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight']

  • This IS expected if you are initializing CLIPTextModelZero from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing CLIPTextModelZero from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Using strategy: pytorch_lightning.strategies.ColossalAIStrategy Monitoring val/loss_simple_ema as checkpoint metric. Merged modelckpt-cfg: {'target': 'lightning.pytorch.callbacks.ModelCheckpoint', 'params': {'dirpath': 'output/2022-12-02T17-05-18_train_colossalaitest/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3}} Traceback (most recent call last): File "/home/notebook//code/ColossalAI/examples/images/diffusion/main.py", line 746, in trainer = Trainer.from_argparse_args(trainer_opt, **trainer_kwargs) File "/opt/conda/envs/ldm1/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 1917, in from_argparse_args return from_argparse_args(cls, args, **kwargs) File "/opt/conda/envs/ldm1/lib/python3.9/site-packages/lightning/pytorch/utilities/argparse.py", line 66, in from_argparse_args return cls(**trainer_kwargs) File "/opt/conda/envs/ldm1/lib/python3.9/site-packages/lightning/pytorch/utilities/argparse.py", line 340, in insert_env_defaults return fn(self, **kwargs) File "/opt/conda/envs/ldm1/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 408, in init self._accelerator_connector = AcceleratorConnector( File "/opt/conda/envs/ldm1/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 223, in init self._init_strategy() File "/opt/conda/envs/ldm1/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 671, in _init_strategy raise RuntimeError(f"{self.strategy} is not valid type: {self.strategy}") AttributeError: 'AcceleratorConnector' object has no attribute 'strategy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/notebook//code/ColossalAI/examples/images/diffusion/main.py", line 829, in `

These mistakes drive me crazy! Could you tell me which is a stable version or give me a commitid that you have check ok?

Environment

No response

GxjGit avatar Dec 02 '22 17:12 GxjGit