guided-diffusion icon indicating copy to clipboard operation
guided-diffusion copied to clipboard

load_state_dict stuck!

Open fido20160817 opened this issue 3 years ago • 3 comments

when load ckpt (multiGPU), it is stuck in load_state_dict () in the defined dist_util.py. But it is fine for one GPU. Anybody knows about this?

fido20160817 avatar Sep 11 '22 14:09 fido20160817

Same question

pmj110119 avatar Nov 16 '22 12:11 pmj110119

I got the solution from https://github.com/openai/guided-diffusion/issues/23. Just delete "if dist.get_rank() == 0" in train_util.py when loading ckpt with multi-GPUs

Suimingzhe avatar Feb 01 '23 02:02 Suimingzhe

So the problem seems to be in version of the PyTorch in your notebook's configurations. From the looks of it, Colab and Jupyter notebooks use 0.4.0. So I added strict=False attribute to load_state_dict(). model.load_state_dict(checkpoint, strict=False) Answer from https://stackoverflow.com/a/54058284

randomrushgirl avatar Feb 06 '23 20:02 randomrushgirl