ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: in stable-diffusion train appear Killed

Open ML-GCN opened this issue 2 years ago • 0 comments

🐛 Describe the bug

image

in the train stage i set epoch 20 but after epoch 3 appear Killed the stop but The GPU is not released image

AND save the checkpoint last.ckpt contains10G? image

Could you tell me the reason for this thank you

Environment

the environment is the same by you provide the yaml is train_colossalai.yaml

ML-GCN avatar Nov 26 '22 10:11 ML-GCN