consistency_models icon indicating copy to clipboard operation
consistency_models copied to clipboard

Use one gpu to generate images using a pretrained model without the communication protocol nccl.

Open stonecropa opened this issue 1 year ago • 25 comments

I only have one gpu, and I want to successfully run the pre-trained model and generate images, what should I do. Where should the code be changed? Please explain in detail, because I am a newcomera in this regard. Thanks.

stonecropa avatar Apr 21 '23 06:04 stonecropa

Make sure your commands aren't starting with mpiexec -n 8 like some of the scripts suggest. This is a multi-GPU command.

thorinf avatar Apr 21 '23 08:04 thorinf

@thorinf
I know it but it need nccl,even it doesn't have mpiexec.Thank you.

stonecropa avatar Apr 21 '23 10:04 stonecropa

Which command are you using? I'll take a look

thorinf avatar Apr 21 '23 10:04 thorinf

mpiexec -n 8 python cm_train.py --training_mode consistency_distillation --sigma_max 80 --sigma_min 0.002 --target_ema_mode fixed --start_ema 0.95 --scale_mode fixed --start_scales 40 --total_training_steps 600000 --loss_norm lpips --lr_anneal_steps 0 --teacher_model_path /path/to/edm_bedroom256_ema.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --teacher_dropout 0.1 --ema_rate 0.9999,0.99994,0.9999432189950708 --global_batch_size 256 --image_size 256 --lr 0.00001 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --schedule_sampler uniform --use_fp16 True --weight_decay 0.0 --weight_schedule uniform --data_dir /path/to/bedroom256

stonecropa avatar Apr 21 '23 10:04 stonecropa

And you've tried this without mpiexec -n 8? Just python cm_train.py....etc.

thorinf avatar Apr 21 '23 10:04 thorinf

yes

stonecropa avatar Apr 21 '23 10:04 stonecropa

so I change the code but it has a bug

stonecropa avatar Apr 21 '23 11:04 stonecropa

What happens if you do dist.get_world_size()? How many GPUs does the machine you are using have? You can also check this with nvidia-smi in terminal.

thorinf avatar Apr 21 '23 11:04 thorinf

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L98

Here in the training loop is where distributed training is selected. It seems to activate assuming CUDA is available, rather than CUDA && Multi-GPU. I am unsure whether DDP is happy to work with just a single GPU; maybe it does, maybe it doesn't. You could try changing this line to force it into the else condition.

I would still check if the machine has multi-GPU. I know if this is a personal machine it may be obvious that it's just single-GPU, so it seems stupid for me to suggest the check. But if it's a server somewhere then might be worth just taking a look, could be surprised to see more than one.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L71

The global batch size is here, if you have a single GPU machine it should be the same as the batch size. If you make the change I suggested above then you will also need to make sure the global batch size is the same as the batch size.

thorinf avatar Apr 21 '23 11:04 thorinf

@thorinf I changed it but also need nccl image image image

stonecropa avatar Apr 21 '23 14:04 stonecropa

You need to change the if th.cuda.is_available():, here you are only change the attribute and DDP is still used.

You could try changing the backend on DDP to not use NCCL.

Or just install NCCL, even though you haven't got multi-GPU it may still work.

thorinf avatar Apr 21 '23 14:04 thorinf

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

stonecropa avatar Apr 21 '23 14:04 stonecropa

Image sample doesn't require a dataset from what I can see. It would be odd for it to be required.

thorinf avatar Apr 21 '23 14:04 thorinf

If you look at the above code you sent. In the else condition the ddp_model is set to be the original model, I think this is what you want. The DDP wrapper is what uses distributed training. For distributed training, NCCL is the Nvidia protocol for multi-GPU communication. You need to avoid DDP to avoid NCCL.

However, I am surprised that DDP checks for NCCL even if only 1 GPU is being used. The code could proceed, it doesn't need the RunTimeError. This is not something you can change though.

thorinf avatar Apr 21 '23 14:04 thorinf

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L98

Here in the training loop is where distributed training is selected. It seems to activate assuming CUDA is available, rather than CUDA && Multi-GPU. I am unsure whether DDP is happy to work with just a single GPU; maybe it does, maybe it doesn't. You could try changing this line to force it into the else condition.

I would still check if the machine has multi-GPU. I know if this is a personal machine it may be obvious that it's just single-GPU, so it seems stupid for me to suggest the check. But if it's a server somewhere then might be worth just taking a look, could be surprised to see more than one.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L71

The global batch size is here, if you have a single GPU machine it should be the same as the batch size. If you make the change I suggested above then you will also need to make sure the global batch size is the same as the batch size.

Hi, I'm also curious about how to run the code in a workstation with multiple GPUs. Simply deleting mpiexec -n 8 will run the code in one single GPU, as I mentioned in the issue #20 . Ths setting is in cm/dist_util.py, but I'm not familar about mpi4py. Do you have any ideas or advices?

1999kevin avatar Apr 21 '23 15:04 1999kevin

Simply deleting mpiexec -n 8 will run the code in one single GPU, as I mentioned in the issue #20 .

I think the issue is that without NCCL installed the DDP is throwing a RunTimeError.

but I'm not familar about mpi4py. Do you have any ideas or advices?

There's also torch.distributed.launch and torchrun which may work.

thorinf avatar Apr 21 '23 15:04 thorinf

There's also torch.distributed.launch and torchrun which may work.

Are there any easier methods? For example, any minor adjustments on mpi4py in cm/dist_util.py?

1999kevin avatar Apr 21 '23 15:04 1999kevin

mpiexec or mpirun are pretty simple, I would definitely recommend learning to use them or trying to use the other launchers I mentioned above. You basically need to have something which runs multiple instances of the code and sets up communication between them.

thorinf avatar Apr 21 '23 15:04 thorinf

Thanks for your recommendation. I will try it in a while!

1999kevin avatar Apr 21 '23 15:04 1999kevin

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e .

Note that this can take ~30mins.

nekoshadow1 avatar Apr 21 '23 18:04 nekoshadow1

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e .

Note that this can take ~30mins.

If you only want to generate images without downloading the large dataset (LSUN training set contains > 1million images btw), you can refer to 'Multistep sampling on class-conditional ImageNet-64, and LSUN 256' part in the following file: https://github.com/openai/consistency_models/blob/main/scripts/launch.sh

After running the setup.py, download the pretrained models provided by the author and specify the path to the pretrained models when you run the sampling commands.

nekoshadow1 avatar Apr 21 '23 21:04 nekoshadow1

@thorinf
Thank you, I am running the pre-training model on my laptop, the system is win10, can I also install nccl?

stonecropa avatar Apr 22 '23 06:04 stonecropa

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e . Note that this can take ~30mins.

If you only want to generate images without downloading the large dataset (LSUN training set contains > 1million images btw), you can refer to 'Multistep sampling on class-conditional ImageNet-64, and LSUN 256' part in the following file: https://github.com/openai/consistency_models/blob/main/scripts/launch.sh

After running the setup.py, download the pretrained models provided by the author and specify the path to the pretrained models when you run the sampling commands.

@nekoshadow1 I have already installed it, and there is no problem with cm, but I still need nccl. Is there any way? Thanks.

stonecropa avatar Apr 22 '23 06:04 stonecropa

mpiexec or mpirun are pretty simple, I would definitely recommend learning to use them or trying to use the other launchers I mentioned above. You basically need to have something which runs multiple instances of the code and sets up communication between them.

Hello, thorinf. Did you reproduce the results of the paper? I found many people said they got bad results

ShyFoo avatar Apr 26 '23 03:04 ShyFoo

You can change the code of line 42 in the dist_util.py. dist.init_process_group(backend="gloo", init_method="env://")

ChenSiyi1 avatar May 14 '23 13:05 ChenSiyi1