consistency_models
consistency_models copied to clipboard
Use one gpu to generate images using a pretrained model without the communication protocol nccl.
I only have one gpu, and I want to successfully run the pre-trained model and generate images, what should I do. Where should the code be changed? Please explain in detail, because I am a newcomera in this regard. Thanks.
Make sure your commands aren't starting with mpiexec -n 8
like some of the scripts suggest. This is a multi-GPU command.
@thorinf
I know it but it need nccl,even it doesn't have mpiexec.Thank you.
Which command are you using? I'll take a look
mpiexec -n 8 python cm_train.py --training_mode consistency_distillation --sigma_max 80 --sigma_min 0.002 --target_ema_mode fixed --start_ema 0.95 --scale_mode fixed --start_scales 40 --total_training_steps 600000 --loss_norm lpips --lr_anneal_steps 0 --teacher_model_path /path/to/edm_bedroom256_ema.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --teacher_dropout 0.1 --ema_rate 0.9999,0.99994,0.9999432189950708 --global_batch_size 256 --image_size 256 --lr 0.00001 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --schedule_sampler uniform --use_fp16 True --weight_decay 0.0 --weight_schedule uniform --data_dir /path/to/bedroom256
And you've tried this without mpiexec -n 8
? Just python cm_train.py
....etc.
yes
so I change the code but it has a bug
What happens if you do dist.get_world_size()
? How many GPUs does the machine you are using have? You can also check this with nvidia-smi
in terminal.
https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L98
Here in the training loop is where distributed training is selected. It seems to activate assuming CUDA is available, rather than CUDA && Multi-GPU. I am unsure whether DDP is happy to work with just a single GPU; maybe it does, maybe it doesn't. You could try changing this line to force it into the else
condition.
I would still check if the machine has multi-GPU. I know if this is a personal machine it may be obvious that it's just single-GPU, so it seems stupid for me to suggest the check. But if it's a server somewhere then might be worth just taking a look, could be surprised to see more than one.
https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L71
The global batch size is here, if you have a single GPU machine it should be the same as the batch size. If you make the change I suggested above then you will also need to make sure the global batch size is the same as the batch size.
@thorinf I changed it but also need nccl
You need to change the if th.cuda.is_available():
, here you are only change the attribute and DDP is still used.
You could try changing the backend on DDP to not use NCCL.
Or just install NCCL, even though you haven't got multi-GPU it may still work.
@thorinf
Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution?
Image sample doesn't require a dataset from what I can see. It would be odd for it to be required.
If you look at the above code you sent. In the else
condition the ddp_model
is set to be the original model, I think this is what you want. The DDP
wrapper is what uses distributed training. For distributed training, NCCL is the Nvidia protocol for multi-GPU communication. You need to avoid DDP to avoid NCCL.
However, I am surprised that DDP checks for NCCL even if only 1 GPU is being used. The code could proceed, it doesn't need the RunTimeError
. This is not something you can change though.
https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L98
Here in the training loop is where distributed training is selected. It seems to activate assuming CUDA is available, rather than CUDA && Multi-GPU. I am unsure whether DDP is happy to work with just a single GPU; maybe it does, maybe it doesn't. You could try changing this line to force it into the
else
condition.I would still check if the machine has multi-GPU. I know if this is a personal machine it may be obvious that it's just single-GPU, so it seems stupid for me to suggest the check. But if it's a server somewhere then might be worth just taking a look, could be surprised to see more than one.
https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L71
The global batch size is here, if you have a single GPU machine it should be the same as the batch size. If you make the change I suggested above then you will also need to make sure the global batch size is the same as the batch size.
Hi, I'm also curious about how to run the code in a workstation with multiple GPUs. Simply deleting mpiexec -n 8
will run the code in one single GPU, as I mentioned in the issue #20 . Ths setting is in cm/dist_util.py, but I'm not familar about mpi4py. Do you have any ideas or advices?
Simply deleting
mpiexec -n 8
will run the code in one single GPU, as I mentioned in the issue #20 .
I think the issue is that without NCCL installed the DDP is throwing a RunTimeError
.
but I'm not familar about mpi4py. Do you have any ideas or advices?
There's also torch.distributed.launch
and torchrun
which may work.
There's also
torch.distributed.launch
andtorchrun
which may work.
Are there any easier methods? For example, any minor adjustments on mpi4py in cm/dist_util.py?
mpiexec
or mpirun
are pretty simple, I would definitely recommend learning to use them or trying to use the other launchers I mentioned above. You basically need to have something which runs multiple instances of the code and sets up communication between them.
Thanks for your recommendation. I will try it in a while!
@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution?
You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e .
Note that this can take ~30mins.
@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution?
You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e .
Note that this can take ~30mins.
If you only want to generate images without downloading the large dataset (LSUN training set contains > 1million images btw), you can refer to 'Multistep sampling on class-conditional ImageNet-64, and LSUN 256' part in the following file: https://github.com/openai/consistency_models/blob/main/scripts/launch.sh
After running the setup.py, download the pretrained models provided by the author and specify the path to the pretrained models when you run the sampling commands.
@thorinf
Thank you, I am running the pre-training model on my laptop, the system is win10, can I also install nccl?
@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution?
You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e . Note that this can take ~30mins.
If you only want to generate images without downloading the large dataset (LSUN training set contains > 1million images btw), you can refer to 'Multistep sampling on class-conditional ImageNet-64, and LSUN 256' part in the following file: https://github.com/openai/consistency_models/blob/main/scripts/launch.sh
After running the setup.py, download the pretrained models provided by the author and specify the path to the pretrained models when you run the sampling commands.
@nekoshadow1 I have already installed it, and there is no problem with cm, but I still need nccl. Is there any way? Thanks.
mpiexec
ormpirun
are pretty simple, I would definitely recommend learning to use them or trying to use the other launchers I mentioned above. You basically need to have something which runs multiple instances of the code and sets up communication between them.
Hello, thorinf. Did you reproduce the results of the paper? I found many people said they got bad results
You can change the code of line 42 in the dist_util.py.
dist.init_process_group(backend="gloo", init_method="env://")