I am using windows 11 with 16gb A4000 GPU
Error after runnin the command "python train_net.py --cfg configs/t2v_train.yaml":
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.2.0+cu121 with CUDA 1201 (you have 2.2.1)
Python 3.8.10 (you have 3.8.18)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\xformers_init_.py", line 55, in _is_triton_available
from xformers.triton.softmax import softmax as triton_softmax # noqa
File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\xformers\triton\softmax.py", line 11, in
import triton
ModuleNotFoundError: No module named 'triton'
Traceback (most recent call last):
File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry.py", line 67, in build_from_config
return req_type_entry(**cfg)
File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\tools\train\train_t2v_enterance.py", line 59, in train_t2v_entrance
worker(0, cfg)
File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\tools\train\train_t2v_enterance.py", line 75, in worker
dist.init_process_group(backend='nccl', world_size=cfg.world_size, rank=cfg.rank)
File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper
func_return = func(*args, **kwargs)
File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\torch\distributed\distributed_c10d.py", line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
File "C:\Users\INP_Rohit.conda\envs\vgen\lib\site-packages\torch\distributed\distributed_c10d.py", line 1302, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_net.py", line 18, in
ENGINE.build(dict(type=cfg_update.TASK_TYPE), cfg_update=cfg_update.cfg_dict)
File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry.py", line 107, in build
return self.build_func(*args, **kwargs, registry=self)
File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry_class.py", line 7, in build_func
return build_from_config(cfg, registry, **kwargs)
File "C:\Users\INP_Rohit\Documents\ImageGeneration\i2vgen-xl\utils\registry.py", line 69, in build_from_config
raise Exception(f"Failed to invoke function {req_type_entry}, with {e}")
Exception: Failed to invoke function <function train_t2v_entrance at 0x00000285FCD03AF0>, with Distributed package doesn't have NCCL built in