VGen icon indicating copy to clipboard operation
VGen copied to clipboard

Distributed package doesn't have NCCL built in

Open 23Rj20 opened this issue 10 months ago • 0 comments

I am using windows 11 with 16gb A4000 GPU

Error after runnin the command "python train_net.py --cfg configs/t2v_train.yaml": WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.2.0+cu121 with CUDA 1201 (you have 2.2.1) Python 3.8.10 (you have 3.8.18) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details A matching Triton is not available, some optimizations will not be enabled Traceback (most recent call last): File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\xformers_init_.py", line 55, in _is_triton_available from xformers.triton.softmax import softmax as triton_softmax # noqa File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\xformers\triton\softmax.py", line 11, in import triton ModuleNotFoundError: No module named 'triton' Traceback (most recent call last): File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry.py", line 67, in build_from_config return req_type_entry(**cfg) File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\tools\train\train_t2v_enterance.py", line 59, in train_t2v_entrance worker(0, cfg) File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\tools\train\train_t2v_enterance.py", line 75, in worker dist.init_process_group(backend='nccl', world_size=cfg.world_size, rank=cfg.rank) File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper func_return = func(*args, **kwargs) File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\torch\distributed\distributed_c10d.py", line 1184, in init_process_group default_pg, _ = _new_process_group_helper( File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\torch\distributed\distributed_c10d.py", line 1302, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL built in") RuntimeError: Distributed package doesn't have NCCL built in

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train_net.py", line 18, in ENGINE.build(dict(type=cfg_update.TASK_TYPE), cfg_update=cfg_update.cfg_dict) File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry.py", line 107, in build return self.build_func(*args, **kwargs, registry=self) File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry_class.py", line 7, in build_func return build_from_config(cfg, registry, **kwargs) File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry.py", line 69, in build_from_config raise Exception(f"Failed to invoke function {req_type_entry}, with {e}") Exception: Failed to invoke function <function train_t2v_entrance at 0x00000285FCD03AF0>, with Distributed package doesn't have NCCL built in

23Rj20 avatar Mar 29 '24 04:03 23Rj20