Thorin Farnsworth

Results 30 comments of Thorin Farnsworth

And you've tried this without `mpiexec -n 8`? Just `python cm_train.py`....etc.

What happens if you do `dist.get_world_size()`? How many GPUs does the machine you are using have? You can also check this with `nvidia-smi` in terminal.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L98 Here in the training loop is where distributed training is selected. It seems to activate assuming CUDA is available, rather than CUDA && Multi-GPU. I am unsure whether [DDP](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html)...

You need to change the `if th.cuda.is_available():`, here you are only change the attribute and DDP is still used. You could try changing the [backend on DDP to not use...

Image sample doesn't require a dataset from what I can see. It would be odd for it to be required.

If you look at the above code you sent. In the `else` condition the `ddp_model` is set to be the original model, I think this is what you want. The...

> Simply deleting `mpiexec -n 8` will run the code in one single GPU, as I mentioned in the issue #20 . I think the issue is that without NCCL...

`mpiexec` or `mpirun` are pretty simple, I would definitely recommend learning to use them or trying to use the other launchers I mentioned above. You basically need to have something...

There are some differences in the paper in general. For example the rho scheduling is reversed in the paper, the formula used in the code here is more similar to...

There are quite a few differences. I've raised an additional ticket and email Yang Song to hopefully know which is better. https://github.com/openai/consistency_models/issues/18 I would need to check, but the scaling...