DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

Multi GPU repeated cuDNN kernal name error.

Open CodeSmith2000 opened this issue 2 years ago • 1 comments

THIS IS NOT TECH SUPPORT FOR NEWBIE FAKERS POST ONLY ISSUES RELATED TO BUGS OR CODE

Expected behavior

To work with multi GPU computing.

Actual behavior

After selecting both GPU's the program will load all images and write out the model parameters, but just before opening the deepfake window, it displays the same error several times. To clarify, it works perfectly fine with a single GPU, this behavior only happens when running duel GPU's. I attempted to run both large 512 models and Quick96 models using a batch size of 1 so its not overloading the VRAM. I also tested this using an old build of DFL that I had on my old hard drive and got the same error.

==================== Model Summary ===================== == == == Model name: YoungZ_SAEHD == == == == Current iteration: 99189 == == == ==------------------ Model Options -------------------== == == == resolution: 512 == == face_type: wf == == models_opt_on_gpu: False == == archi: liae-ud == == ae_dims: 256 == == e_dims: 128 == == d_dims: 128 == == d_mask_dims: 22 == == masked_training: True == == eyes_mouth_prio: True == == uniform_yaw: True == == blur_out_mask: False == == adabelief: False == == lr_dropout: n == == random_warp: True == == random_hsv_power: 0.0 == == true_face_power: 0.0 == == face_style_power: 0.0 == == bg_style_power: 0.0 == == ct_mode: none == == clipgrad: False == == pretrain: False == == autobackup_hour: 12 == == write_preview_history: False == == target_iter: 200000 == == random_src_flip: False == == random_dst_flip: False == == batch_size: 2 == == gan_power: 0.0 == == gan_patch_size: 64 == == gan_dims: 16 == == == ==-------------------- Running On --------------------== == == == Device index: 0 == == Name: NVIDIA GeForce RTX 3060 == == VRAM: 10.55GB == == Device index: 1 == == Name: NVIDIA GeForce GTX 1060 6GB == == VRAM: 4.87GB == == ==

Starting. Target iteration: 200000. Press "Enter" to stop training and save model. Repeated name: [volta_s884cudnn_fp16_256x64_ldg8_relu_exp_small_nhwc_tn_v1] Repeated name: [volta_s884cudnn_fp16_256x64_ldg8_relu_exp_small_nhwc_tn_v1] Repeated name: [volta_s884cudnn_fp16_256x64_ldg8_relu_exp_small_nhwc_tn_v1] Repeated name: [volta_s884cudnn_fp16_256x64_ldg8_relu_exp_small_nhwc_tn_v1]

Steps to reproduce

Simply run the program

Other relevant information

  • Command lined used (if not specified in steps to reproduce): main.py ...
  • Operating system and version: Windows 10
  • RTX 3060 12gb and GTX 1060 6gb.

CodeSmith2000 avatar Sep 08 '22 20:09 CodeSmith2000

1000 series GPU and 3000 series run different builds of DFL, this is why.

zabique avatar Jan 21 '23 17:01 zabique

Issue solved / already answered (or it seems like user error), please close it.

joolstorrentecalo avatar Jun 08 '23 23:06 joolstorrentecalo