ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: diffusion train question

Open Chenhuaqi6 opened this issue 2 years ago • 3 comments

🐛 Describe the bug

/home/chenhq/anaconda3/envs/ldm/lib/python3.9/site-packages/lightning-1.9.0-py3.9.egg/lightning/pytorch/loggers/tensorboard.py:188: UserWarning: Could not log computational graph to TensorBoard: The model.example_input_array attribute is not set or input_array was not given. rank_zero_warn( /home/chenhq/anaconda3/envs/ldm/lib/python3.9/site-packages/lightning-1.9.0-py3.9.egg/lightning/pytorch/strategies/ddp.py:437: UserWarning: Error handling mechanism for deadlock detection is uninitialized. Skipping check. rank_zero_warn("Error handling mechanism for deadlock detection is uninitialized. Skipping check.") Summoning checkpoint.

run python main.py --logdir ./tmp/ --train --base configs/Teyvat/train_colossalai_teyvat.yaml --ckpt 512-base-ema.ckpt , No output but gpu 100%, Can you help me what the problem is?

Environment

cuda11.2 python3.9 pytorch 1.10.0

Chenhuaqi6 avatar Mar 08 '23 04:03 Chenhuaqi6

Hi, what does it mean by no output? Can you show me the full execution message log? Thanks

JThh avatar Mar 08 '23 09:03 JThh

logs.txt Wait here, wait for an hour or so, no more output

Chenhuaqi6 avatar Mar 08 '23 09:03 Chenhuaqi6

@Fazziekey , can you take a look at this issue? Thanks

JThh avatar Mar 08 '23 10:03 JThh

@Fazziekey , can you take a look at this issue? Thanks

ok

Fazziekey avatar Mar 09 '23 01:03 Fazziekey

@Fazziekey @JThh sorry, my dataset is error, is runing now. I will close this issue, Thanks

Chenhuaqi6 avatar Mar 09 '23 03:03 Chenhuaqi6

@Fazziekey @JThh sorry, my dataset is error, is runing now. I will close this issue, Thanks

Thanks

Fazziekey avatar Mar 09 '23 03:03 Fazziekey