adetailer [Bug]: ControlNet Batch in text2img breaks when Use Separate Model is checked

Describe the bug

Using Reliberate 1.0 as my separate model (or any other separate model), everything works beautifully in txt2img when selecting individual images as source. Using batch ControlNet sources, the first image processes correctly, but the second image always fails (no matter what the second image is). See error below. Note that I have AnimateDiff installed but not active, and I have Prompt Fusion installed and I have a one example of prompt fusion active in the negative prompt (weight travel): "(drawing:0.35,0.65)". Removing this does not fix the problem, however.

Steps to reproduce

Create a prompt and negative prompt. Add a ControlNet with a batch source. Set Use Separate Model to a different model than your base model. Start generation, the first batch will process properly and the second will fail. Note that if you have Use Separate Model checked but select the same model selected, the batch generation works fine, so the issue seems to be the loading/unloading of the additional model between batch generations.

Screenshots

No response

Console logs, from start to end.

INFO:sd_dynamic_prompts.dynamic_prompting:Prompt matrix will create 2 images in a total of 1 batches.
2024-04-12 10:25:59,246 - ControlNet - INFO - Loading model from cache: control_v1p_sd15_qrcode_monster_v2 [5e5778cb]
2024-04-12 10:25:59,259 - ControlNet - INFO - Loading preprocessor: none
2024-04-12 10:25:59,259 - ControlNet - INFO - preprocessor resolution = 640
2024-04-12 10:25:59,268 - ControlNet - INFO - Loading model from cache: control_v11p_sd15_normalbae [316696f1]
2024-04-12 10:25:59,284 - ControlNet - INFO - Loading preprocessor: normal_map
2024-04-12 10:25:59,285 - ControlNet - INFO - preprocessor resolution = 512
2024-04-12 10:25:59,317 - ControlNet - INFO - Loading model from cache: control_v11p_sd15_scribble [d4ba51ff]
2024-04-12 10:25:59,330 - ControlNet - INFO - Loading preprocessor: pidinet_scribble
2024-04-12 10:25:59,330 - ControlNet - INFO - preprocessor resolution = 640
2024-04-12 10:25:59,366 - ControlNet - INFO - ControlNet Hooked - Time = 0.22551321983337402
INFO:sd_dynamic_prompts.dynamic_prompting:Prompt matrix will create 2 images in a total of 1 batches.
2024-04-12 10:26:00,749 - ControlNet - INFO - Loading model from cache: control_v1p_sd15_qrcode_monster_v2 [5e5778cb]
2024-04-12 10:26:00,779 - ControlNet - INFO - Loading preprocessor: none
2024-04-12 10:26:00,779 - ControlNet - INFO - preprocessor resolution = 640
2024-04-12 10:26:00,787 - ControlNet - INFO - Loading model from cache: control_v11p_sd15_normalbae [316696f1]
2024-04-12 10:26:00,798 - ControlNet - INFO - Loading preprocessor: normal_map
2024-04-12 10:26:00,798 - ControlNet - INFO - preprocessor resolution = 512
2024-04-12 10:26:01,099 - ControlNet - INFO - Loading model from cache: control_v11p_sd15_scribble [d4ba51ff]
2024-04-12 10:26:01,112 - ControlNet - INFO - Loading preprocessor: pidinet_scribble
2024-04-12 10:26:01,113 - ControlNet - INFO - preprocessor resolution = 640
2024-04-12 10:26:01,220 - ControlNet - INFO - ControlNet Hooked - Time = 0.48270153999328613
Using already loaded model 0.4(duchaitenMindbreak_v20) + 0.6(epicphotogasm_v2DodgeAndBurn).ckpt [bb321cc5de]: done in 1.0s (send model to cpu: 0.5s, send model to device: 0.5s)
  0%|                                                                                           | 0/30 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: ('task(uo5ww6lswo80u6k)', '{prompt...}', '{negative prompt...}', [], 30, 'Euler a', 1, 2, 4.5, 960, 640, False, 0.75, 1.5, 'Latent', 14, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x00000160C7816920>, 0, False, '', 0.8, 2002106689, True, -1, 0, 624, 624, True, -1.0, True, True, False, {'ad_model': 'face_yolov8n.pt', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.55, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 0.4, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.35, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 640, 'ad_inpaint_height': 960, 'ad_use_steps': False, 'ad_steps': 40, 'ad_use_cfg_scale': True, 'ad_cfg_scale': 5.5, 'ad_use_checkpoint': True, 'ad_checkpoint': 'reliberate_v10.safetensors [980cb713af]', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'Euler a', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 2, 'ad_restore_face': False, 'ad_controlnet_model': 'control_v11p_sd15_openpose [cab727d4]', 'ad_controlnet_module': 'openpose_full', 'ad_controlnet_weight': 0.5, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 0.5, 'is_api': ()}, {'ad_model': 'None', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0.001, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.15, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 640, 'ad_inpaint_height': 960, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'reliberate_v10.safetensors [980cb713af]', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'Euler a', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': True, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'control_v11f1e_sd15_tile [a371b31b]', 'ad_controlnet_module': 'tile_resample', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 'keyword prompt', 'keyword1, keyword2', 'None', 'textual inversion first', 'None', 'Lanczos', 'None', False, 'x264', 'mci', 10, 0, 0, False, True, True, True, 'intermediate', 'animation', True, False, 1, False, False, True, 0.85, 1.15, 100, 0.7, False, False, False, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', <scripts.animatediff_ui.AnimateDiffProcess object at 0x0000016120D47760>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x0000016120D44CD0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x00000160D436F4C0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x0000016120D47FD0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x00000160D436D5A0>, False, 0, False, False, 0.0, True, '', 0, True, '', 0, True, True, False, False, False, 'Matrix', 'Columns', 'Mask', 'Prompt', '1,1', '0.9,0.5,0.5', True, False, False, 'Attention', False, '0', '0', '0.8', None, '0', '0', False, False, False, 0, None, [], 0, False, [], [], False, 0, 1, False, False, 0, None, [], -2, False, [], False, 0, None, None, False, False, False, False, False, False, False, False, '1:1,1:2,1:2', '0:0,0:0,0:1', '0.2,0.8,0.8', 20, 0.2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, None, None, False, 50, [], 30, '', 4, [], 1, '', '', '', '', False, 4.0, '', 10.0, 'Linear', 3, False, 30.0, True, False, False, 0, 0.0, 'Lanczos', 1, True, 0, 0, 0.001, 75, 0.0, False, True, 10.0, 30.0, True, 0.0, 'Lanczos', 1, 0, 0, 75, 0.0001, 0.0, False, True, False, False, 'linear (weight sum)', '10', 'C:\\SDW\\stable-diffusion-webui\\extensions\\stable-diffusion-webui-prompt-travel\\img\\ref_ctrlnet', 'Lanczos', 2, 0, 0, 'mp4', 10.0, 0, '', True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, 'linear', 'lerp', 'token', 'random', '30', 'fixed', 1, '8', None, 'Lanczos', 2, 0, 0, 'mp4', 10.0, 0, '', True, False, False, 'Illustration', 'svg', True, True, False, 0.5, False, 16, True, 16) {}
    Traceback (most recent call last):
      File "C:\SDW\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\SDW\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\SDW\stable-diffusion-webui\modules\txt2img.py", line 55, in txt2img
        processed = processing.process_images(p)
      File "C:\SDW\stable-diffusion-webui\modules\processing.py", line 732, in process_images
        res = process_images_inner(p)
      File "C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 49, in processing_process_images_hijack
        processed = self.process_images_cn_batch(p, *args, **kwargs)
      File "C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 74, in process_images_cn_batch
        processed = getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "C:\SDW\stable-diffusion-webui\modules\processing.py", line 867, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 451, in process_sample
        return process.sample_before_CN_hack(*args, **kwargs)
      File "C:\SDW\stable-diffusion-webui\modules\processing.py", line 1140, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "C:\SDW\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\SDW\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "C:\SDW\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\SDW\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "C:\SDW\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "C:\SDW\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\SDW\stable-diffusion-webui\modules\sd_samplers_cfg_denoiser.py", line 201, in forward
        devices.test_for_nans(x_out, "unet")
      File "C:\SDW\stable-diffusion-webui\modules\devices.py", line 136, in test_for_nans
        raise NansException(message)
    modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

List of installed extensions

Apr 12 '24 15:04 marcsyp

The error you're encountering, where a tensor with all NaNs is produced during the second image processing in a batch with ControlNet and a separate model, suggests an issue related to the precision of tensor computations or the support of half type by your video card. To address this, you might want to try setting the "Upcast cross attention layer to float32" in the Stable Diffusion settings or using the --no-half command-line argument to improve precision.

If these solutions don't resolve the issue, consider investigating further into areas such as ControlNet initialization for batch processing, ensuring the batch processing logic correctly handles state for each image, and verifying that ControlNet arguments are correctly updated for each image in the batch. Additionally, look into memory and resource management to prevent potential leaks or inefficiencies that could affect processing.

Adding detailed logging around the batch processing steps could also help pinpoint where the process fails on the second image. Ensure that any concurrency or parallel execution in the processing does not lead to race conditions or deadlocks.

Reviewing and debugging these areas might help identify the root cause of the failure when processing the second image in a batch with a separate model. If you need further assistance or have more questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Apr 12 '24 15:04 dosubot[bot]

As you can see from the error message, this is essentially not an error from the adetailer.

It would be nice if there was a bfloat16 option for unet, but unfortunately there isn't, so it's best to find a checkpoint where this error doesn't occur.

Apr 13 '24 07:04 Bing-su

Are we sure that the error message is an accurate reflection of what is happening? I believe the error is worded that way because that is a "typical" reason that a Unet would produce NaNs, but in fact these NaNs are being produced for another reason. I remember seeing a post somewhere (can't find it right now, unfortunately), about this issue that was explaining that the reason this is happening is because ADetailer (or possibly ControlNet when using ADetailer) doesn't unload the separate model properly between batch runs, which is why the Batch process fails but "Generate Forever" and standard generation works fine.

Have you ever actually debugged this issue? While I like dosubot, my experience is that AI tools can solve straightforward issues but struggle with more complex challenges, so I would hesitate to take its feedback as a final opinion. Any thoughts?

Apr 23 '24 21:04 marcsyp

The issue you're encountering with NaNs during batch processing with ControlNet and a separate model seems to be related to improper state management between batch runs. This could indeed be due to the separate model not being correctly unloaded or reset, as you suspected. To tackle this, consider using specific context managers designed to ensure proper state management and resource cleanup between batch operations.

Proper Initialization and Cleanup: Make sure to initialize and clean up the separate model correctly between each batch item. This involves ensuring that any global or static states are reset before each new batch item is processed.
Debugging and Logging: Add detailed logging around the batch processing steps to identify exactly where the process fails on the second image. This can help pinpoint whether the issue is with the model unloading, a memory leak, or something else.
Memory and Resource Management: Monitor the memory usage and ensure that resources are being correctly managed between batch items. This includes explicitly freeing up any resources that are no longer needed and ensuring that the garbage collector is invoked as necessary.
Concurrency Issues: If your batch processing involves concurrency or parallel execution, ensure that there are no race conditions or deadlocks that could be causing the issue.
Precision and Support for Half Type: As suggested by the error message, if the issue persists, consider adjusting the precision of tensor computations by setting the "Upcast cross attention layer to float32" in the Stable Diffusion settings or using the --no-half command-line argument.

It's important to approach this issue methodically, starting with ensuring proper model management between batches and then moving on to more specific adjustments based on the detailed logging and debugging information you gather. If these steps do not resolve the issue, further investigation into the specific implementation of ControlNet and the separate model handling in your setup may be necessary.

Details

Sources

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Apr 23 '24 21:04 dosubot[bot]

@dosu I think #1 is the most likely option and where I will start. Are you aware of any code in the Automatic1111 repo or ControlNet repo that would give me a clue as to how the states are reset after a typical execution that could serve as a model for how I might do so after ADetailer is finished with its passes in a batch (and where might be the best place to do that in the code in this repo)?

Thanks

Apr 25 '24 00:04 marcsyp

To address the issue of the separate model not being correctly unloaded or reset between batch runs, particularly with ControlNet and ADetailer, you'll want to focus on ensuring proper state management for each batch item. This involves a few key steps:

Proper Model Initialization and Cleanup: Before processing each batch item, ensure the model is properly initialized. After processing, the model should be reset or unloaded to prevent interference with subsequent batch items. This can be achieved by setting the model or any related objects to None and then reinitializing the model before the next batch item is processed.
Explicit Garbage Collection: After unloading or resetting the model, explicitly call Python's garbage collection using gc.collect() to ensure that all unused memory is reclaimed.
Exception Handling: Make sure to handle any exceptions raised during the model's reset or unload process to prevent the application from crashing. This includes catching exceptions related to attribute errors when accessing uninitialized model properties.

Here's a conceptual example to guide you:

import gc

class ControlNetExt:
    # Assuming existing initialization and methods...

    def reset_controlnet(self):
        # Reset the external ControlNet model reference
        self.external_cn = None
        # Explicitly call garbage collection
        gc.collect()
        # Reinitialize the ControlNet model for the next batch item
        self.init_controlnet()

# Example usage
control_net_ext = ControlNetExt()
control_net_ext.init_controlnet()
# Process batch item...
# After processing, reset the model
control_net_ext.reset_controlnet()

This approach ensures that the ControlNet model is freshly loaded for each batch item, potentially preventing the issue of the second image in the batch failing. Remember, this is a conceptual solution, and the actual implementation may require adjustments based on the full context of your project and how the ControlNet model is integrated and used within it.

If you encounter any further issues or have more questions, feel free to ask.

Details

Sources

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Apr 25 '24 03:04 dosubot[bot]

As you described, I have tried generating images with using both separate model and controlnet batch, and I have not encountered NansException.

The code for loading and unloading model is done through the override-settings in stable-diffusion-webui, which can be found here.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/1c0a0c4c26f78c32095ebc7f8af82f5c04fca8c0/modules/processing.py#L847-L858

Apr 25 '24 23:04 Bing-su

Same problem. Creating one image seems to work but when upping the batch beyond 1 triggers the nans error.

Also, this ...

Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

... doesn't solve the problem.

May 15 '24 17:05 Gem717

i think i have solved this issue. I found it strange that it had to be something with models in memory getting corrupt, and after reloading it was ok.

So i downloaded realistic vision again, put in in a new folder, and gave it a read only attribute. I think when switching from main generation to adetailer something happens to the model, and it gets corrupted.

Jun 05 '24 19:06 Elise96nl

i think i have solved this issue. I found it strange that it had to be something with models in memory getting corrupt, and after reloading it was ok.

So i downloaded realistic vision again, put in in a new folder, and gave it a read only attribute. I think when switching from main generation to adetailer something happens to the model, and it gets corrupted.

Would you mind to share your finding about "something happens to the model"?

Jun 12 '24 13:06 bigmover

adetailer adetailer copied to clipboard

[Bug]: ControlNet Batch in text2img breaks when Use Separate Model is checked

Describe the bug

Steps to reproduce

Screenshots

Console logs, from start to end.

List of installed extensions

Sources

About Dosu

Details

Details

adetailer
adetailer copied to clipboard