sd-webui-segment-anything icon indicating copy to clipboard operation
sd-webui-segment-anything copied to clipboard

[Bug]: GroundingDINO doesn't respect `--device-id` flag

Open RangerCD opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Have you updated WebUI and this extension to the newest version?

  • [X] I have updated WebUI and this extension to the most up-to-date version

Do you understand that you should go to https://github.com/IDEA-Research/Grounded-Segment-Anything/issues if you cannot install GroundingDINO?

  • [X] My problem is not about installing GroundingDINO

Do you know that you should use the newest ControlNet extension and enable external control if you want SAM extension to control ControlNet?

  • [X] I have updated ControlNet extension and enabled "Allow other script to control this extension"

What happened?

GroundingDINO always access GPU 0 even if --device-id is set to non-zero value, and trigger illegal memory access CUDA error when you generate bounding box again.

Steps to reproduce the problem

  1. Start WebUI on multi-GPU server with non-zero GPU ID, such as ./webui.sh --device-id 1
  2. Check Enable GroundingDINO
  3. Select model, enter some prompts
  4. Check I want to preview GroundingDINO detection result and select the boxes I want.
  5. Click Generate bounding box
  6. Wait until finished
  7. Click Generate bounding box again
  8. You should notice error logs in terminal RuntimeError: CUDA error: an illegal memory access was encountered
  9. Run nvidia-smi in another terminal, you should notice a process named python3 using both GPU 0 and the one you specified in step 1.

What should have happened?

GroundingDINO should not access GPU 0 at any moment.

Commit where the problem happens

webui: 22bcc7be428c94e9408f589966c2040187245d81 extension: 724b4db6

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

cmdline:

./webui.sh -f --listen --device-id 7

modified webui-user.sh:

install_dir="/mnt"

I'm running WebUI inside a docker container with:

docker run --name stable-diffusion -it --runtime nvidia --gpus all --ipc host -v ${HOME}:/mnt -p 7860:7860 pytorch/pytorch:1.13.1-cuda11.6-cudnn8-devel

Console logs

Launching Web UI with arguments: -f --listen --device-id 3
No module 'xformers'. Proceeding without it.
Loading weights [1a189f0be6] from /mnt/stable-diffusion-webui/models/Stable-diffusion/sdv1-5-pruned.safetensors
Creating model from config: /mnt/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 2.1s (load weights from disk: 0.6s, create model: 0.4s, apply weights to model: 0.2s, apply half(): 0.2s, load VAE: 0.2s, move model to device: 0.4s).
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 9.4s (import torch: 1.0s, import gradio: 1.1s, import ldm: 1.4s, other imports: 1.9s, load scripts: 1.1s, load SD checkpoint: 2.2s, create ui: 0.5s, gradio launch: 0.1s).
Start SAM Processing
Running GroundingDINO Inference
Initializing GroundingDINO GroundingDINO_SwinB (938MB)
final text_encoder_type: bert-base-uncased
/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:768: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Initializing SAM
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/mnt/stable-diffusion-webui/extensions/sd-webui-segment-anything/scripts/sam.py", line 161, in sam_predict
    sam = init_sam_model(sam_model_name)
  File "/mnt/stable-diffusion-webui/extensions/sd-webui-segment-anything/scripts/sam.py", line 130, in init_sam_model
    sam_model_cache[sam_model_name] = load_sam_model(sam_model_name)
  File "/mnt/stable-diffusion-webui/extensions/sd-webui-segment-anything/scripts/sam.py", line 56, in load_sam_model
    sam.to(device=device)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Additional information

Generated by neofetch on host machine:

OS: Ubuntu 20.04.5 LTS x86_64
Host: X660 G45 Whitley
Kernel: 5.4.0-147-generic
Uptime: 6 hours, 5 mins
Packages: 1199 (dpkg), 4 (snap)
Shell: zsh 5.8
Resolution: 1024x768
Terminal: /dev/pts/3
CPU: Intel Xeon Platinum 8369C (128) @ 3.500GHz
GPU: NVIDIA 8e:00.0 NVIDIA Corporation Device 20b2
GPU: NVIDIA 56:00.0 NVIDIA Corporation Device 20b2
GPU: NVIDIA e8:00.0 NVIDIA Corporation Device 20b2
GPU: NVIDIA 8a:00.0 NVIDIA Corporation Device 20b2
GPU: NVIDIA eb:00.0 NVIDIA Corporation Device 20b2
GPU: NVIDIA 6b:00.0 NVIDIA Corporation Device 20b2
GPU: NVIDIA 71:00.0 NVIDIA Corporation Device 20b2
GPU: NVIDIA 51:00.0 NVIDIA Corporation Device 20b2
Memory: 26134MiB / 1031335MiB

RangerCD avatar Apr 21 '23 09:04 RangerCD

This is SAM's error, and I am unfortunately unable to help you because I do not have access to multiple GPU. Please post your question at SAM repository: https://github.com/facebookresearch/segment-anything

continue-revolution avatar Apr 21 '23 20:04 continue-revolution

Delete it “--listen” try again

cdmusic2019 avatar Apr 22 '23 05:04 cdmusic2019

This is SAM's error, and I am unfortunately unable to help you because I do not have access to multiple GPU. Please post your question at SAM repository: https://github.com/facebookresearch/segment-anything

@continue-revolution I don't know much about implementation detail of SAM. So I decide to bypassing this issue by passing only the GPU I want for each container, which makes process think it's a single GPU environment, and I don't have to specify --device-id.

RangerCD avatar Apr 23 '23 02:04 RangerCD

Delete it “--listen” try again

@cdmusic2019 I don't think --listen has anything to do with this issue, the only purpose of this flag is to accept remote connection, see here. And also I can confirm --device-id has been passed to WebUI correctly, other components do respect this flag.

RangerCD avatar Apr 23 '23 02:04 RangerCD