denoising-diffusion-gan
denoising-diffusion-gan copied to clipboard
ImportError: DLL load failed while importing fused: The specified module could not be found.
When I run the command:python train_ddgan.py --dataset celeba_256 --image_size 256 --exp ddgan_celebahq_exp1 --num_channels 3 --num_channels_dae 64 --ch_mult 1 1 2 2 4 4 --num_timesteps 2 --num_res_blocks 2 --batch_size 4 --num_epoch 800 --ngf 64 --embedding_type positional --use_ema --r1_gamma 2. --z_emb_dim 256 --lr_d 1e-4 --lr_g 2e-4 --lazy_reg 10 --num_process_per_node 1 --save_content
to perform training in an anaconda virtual environment, and I get the error as shown as picture below:
(The Chinese in last line is "The specified module could not be found.")
I don't know what causes this error and how to solve it, could anyone give me some suggestions to solve this error?
I have tried some solutions for the problems that are issued in other repositories and very similar with mine, some of reasons causing this kind of problem are the version issues of torch (pytorch) and CUDA. In addition, there was an another reason causing this problem due to the 'ninja' as described in https://github.com/NVlabs/stylegan3/issues/88, but I don't know what version of 'ninja' I have to install (currently, I installed the latest version), honestly, I'm not sure if my case (my problem) is associated with 'ninja'.
My libs and their versions installed in my anaconda virtual environment are shown as below:
By the way, I only use a single GPU, it is 12GB GeForce RTX 4070
I have tried to upgrade the versions of pytorch and cudatoolkit, however, whenever the version of pytorch is newer than 1.10.1 (>1.10.1), If I run the same training command mentioned in beginning, it gives me another new error as shown as picture below (the new error was the same as how-to-resolve-the-error-message-return-tcpstore-runtimeerror-unmatched, and I don't know how to solve it, either
In conclusion, if I want to solve the error:"DLL load failed while importing fused: The specified module could not be found", it seems I have to upgrade the versions of pytorch, however, if I want to solve the error:"return TCPStore( ) RuntimeError: unmatched '}' in format string", I have to downgrade the version of pytorch, the solutions of the two errors conflict each other, I have tried many combinations of versions of pytorch and cudatoolkit to install, unfortunately, there isn't a combination can solve these two errors simultaneously.
I mainly want to solve the error: ImportError: DLL load failed while importing fused: The specified module could not be found.. If someone knows how to solve the error:ImportError: DLL load failed while importing fused: The specified module could not be found., please give me some suggestions to let me successfully perform training.
Thanks a lot for anyone's help !!! If you need, I will provide more details about my problem, thanks !
Exactly the same error as yours and i'm using RTX 4070 too. RTX 4070 is a new architecture named Ada Lovelace and the minimal version of CUDA is 11.8, and you must use python>=3.10 to make those things works.After searching solutions for weeks, no idea found yet.