label-studio-ml-backend docker of mmdetection3 not support CUDA

docker of mmdetection3 not support CUDA

Open xiaoyao9184 opened this issue 11 months ago • 1 comments

docker image heartexlabs/label-studio-ml-backend:mmdetection3-master encountered an issue.

RuntimeError( RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

same as https://github.com/HumanSignal/label-studio-ml-backend/issues/560 https://github.com/HumanSignal/label-studio-ml-backend/issues/79

Overriding the command to remove the --preload parameter will allow it to run normally.

services:
  mmdetection3:
    image: heartexlabs/label-studio-ml-backend:mmdetection3-master
    container_name: mmdetection3
    # https://github.com/HumanSignal/label-studio-ml-backend/issues/79
    command: gunicorn --bind :9090 --workers 1 --threads 8 --timeout 0 _wsgi:app
    env_file: external_env-mmdetection3.env
    ports:
      - '9090'
    networks:
      - labelstudio
    volumes:
      - data:/data

after that

[2025-01-08 06:42:00,804] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/label_studio_ml/api.py", line 69, in _predict
    response = model.predict(tasks, context=context, **params)
  File "/app/mmdetection.py", line 160, in predict
    prediction = self.predict_one_task(task)
  File "/app/mmdetection.py", line 167, in predict_one_task
    model_results = inference_detector(model, image_path).pred_instances
  File "/opt/conda/lib/python3.9/site-packages/mmdet/apis/inference.py", line 189, in inference_detector
    results = model.test_step(data_)[0]
  File "/opt/conda/lib/python3.9/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
    return self._run_forward(data, mode='predict')  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
    results = self(**data, mode=mode)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/detectors/base.py", line 94, in forward
    return self.predict(inputs, data_samples)
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/detectors/single_stage.py", line 110, in predict
    results_list = self.bbox_head.predict(
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 197, in predict
    predictions = self.predict_by_feat(
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/dense_heads/yolo_head.py", line 280, in predict_by_feat
    results = self._bbox_post_process(
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 485, in _bbox_post_process
    det_bboxes, keep_idxs = batched_nms(bboxes, results.scores,
  File "/opt/conda/lib/python3.9/site-packages/mmcv/ops/nms.py", line 303, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/opt/conda/lib/python3.9/site-packages/mmengine/utils/misc.py", line 395, in new_func
    output = old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/mmcv/ops/nms.py", line 127, in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
  File "/opt/conda/lib/python3.9/site-packages/mmcv/ops/nms.py", line 27, in forward
    inds = ext_module.nms(
RuntimeError: nms_impl: implementation for device cuda:0 not found.

Same as https://github.com/open-mmlab/mmdetection/issues/6765

Reinstalling mmcv in the container and restarting it resolves the issue.

root@f0ea7e29bc01:/app# mim uninstall mmcv
Found existing installation: mmcv 2.1.0
Uninstalling mmcv-2.1.0:
  Would remove:
    /opt/conda/lib/python3.9/site-packages/mmcv-2.1.0.dist-info/*
    /opt/conda/lib/python3.9/site-packages/mmcv/*
Proceed (Y/n)? y
  Successfully uninstalled mmcv-2.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
root@f0ea7e29bc01:/app# mim install mmcv==2.1.0 
Looking in links: https://download.openmmlab.com/mmcv/dist/cu116/torch1.13.0/index.html
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x78bf017571f0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /mmcv/dist/cu116/torch1.13.0/index.html
Collecting mmcv==2.1.0
  Downloading https://download.openmmlab.com/mmcv/dist/cu116/torch1.13.0/mmcv-2.1.0-cp39-cp39-manylinux1_x86_64.whl (97.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.6/97.6 MB 427.3 kB/s eta 0:00:00
Requirement already satisfied: addict in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (2.4.0)
Requirement already satisfied: mmengine>=0.3.0 in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (0.10.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (1.26.4)
Requirement already satisfied: packaging in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (24.2)
Requirement already satisfied: Pillow in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (10.4.0)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (6.0.2)
Requirement already satisfied: yapf in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (0.43.0)
Requirement already satisfied: opencv-python>=3 in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (4.10.0.84)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.9/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (3.9.4)
Requirement already satisfied: rich in /opt/conda/lib/python3.9/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (13.4.2)
Requirement already satisfied: termcolor in /opt/conda/lib/python3.9/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (2.5.0)
Requirement already satisfied: platformdirs>=3.5.1 in /opt/conda/lib/python3.9/site-packages (from yapf->mmcv==2.1.0) (4.3.6)
Requirement already satisfied: tomli>=2.0.1 in /opt/conda/lib/python3.9/site-packages (from yapf->mmcv==2.1.0) (2.2.1)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (4.55.3)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.4.7)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (2.9.0.post0)
Requirement already satisfied: importlib-resources>=3.2.0 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (6.4.5)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.9/site-packages (from rich->mmengine>=0.3.0->mmcv==2.1.0) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.9/site-packages (from rich->mmengine>=0.3.0->mmcv==2.1.0) (2.18.0)
Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (3.21.0)
Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.9/site-packages (from markdown-it-py>=2.2.0->rich->mmengine>=0.3.0->mmcv==2.1.0) (0.1.2)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.16.0)
Installing collected packages: mmcv
Successfully installed mmcv-2.1.0

Referring to https://github.com/open-mmlab/mmdetection/issues/6765#issuecomment-2563953554 can resolve the issue.

change dockerfile like that

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PORT=${PORT:-9090} \
    PIP_CACHE_DIR=/.cache \
    WORKERS=1 \
    THREADS=8 \
    CUDA_HOME=/usr/local/cuda
ENV PATH="${CUDA_HOME}/bin:${PATH}"
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX;8.9;9.0"

Jan 08 '25 12:01 xiaoyao9184

Thanks for reporting this issue, @xiaoyao9184! Are you able to submit a PR to implement the dockerfile change you call out above?

Jan 29 '25 14:01 nehalecky

label-studio-ml-backend label-studio-ml-backend copied to clipboard

docker of mmdetection3 not support CUDA

label-studio-ml-backend
label-studio-ml-backend copied to clipboard