ColossalAI
ColossalAI copied to clipboard
[BUG]: in stable-diffusion train appear Killed
🐛 Describe the bug
in the train stage i set epoch 20 but after epoch 3 appear Killed the stop but The GPU is not released
AND save the checkpoint last.ckpt contains10G?
Could you tell me the reason for this thank you
Environment
the environment is the same by you provide the yaml is train_colossalai.yaml