T2I-Adapter
T2I-Adapter copied to clipboard
torch.nn.parallel.DistributedDataParallel hang on
I encountered ‘torch.nn.parallel.DistributedDataParallel hang on’ problemwhen I run the train_depth.py. I found that the program cannot enter the statement "dist._verify_model_across_ranks"
How to solve this problem
This is a function inside torch
Also, here's the problem I'm having with multiple GPUs
what's the command you run?
what's the command you run?
CUDA_VISIBLE_DEVICES=1,3 python -m torch.distributed.launch --nproc_per_node=2 --master_port 8888 test11.py --bsize=8
what's the command you run? Currently, model_ad can be loaded into torch.nn.parallel.DistributedDataParallel, but when the model is set to sd-v1-4.ckpt, it cannot be loaded into torch.nn.parallel.DistributedDataParallel, and it will get stuck.