UniAD icon indicating copy to clipboard operation
UniAD copied to clipboard

eval stage2 error

Open ElfenSterben opened this issue 7 months ago • 9 comments

version: V2.0

./tools/uniad_dist_eval.sh ./projects/configs/stage2_e2e/base_e2e.py ./ckpts/uniad_base_e2e.pth  8

Traceback` (most recent call last):
  File "./tools/test.py", line 262, in <module>
    main()
  File "./tools/test.py", line 231, in main
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
    super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
Traceback (most recent call last):
  File "./tools/test.py", line 262, in <module>
ValueError: Unexpected keyword arguments: `compute_on_step`
    main()
  File "./tools/test.py", line 231, in main
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
    super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
Traceback (most recent call last):
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
  File "./tools/test.py", line 262, in <module>
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
ValueError: Unexpected keyword arguments: `compute_on_step`
    main()
  File "./tools/test.py", line 231, in main
Traceback (most recent call last):
  File "./tools/test.py", line 262, in <module>
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
    main()
  File "./tools/test.py", line 231, in main
    super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
Traceback (most recent call last):
  File "./tools/test.py", line 262, in <module>
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
ValueError: Unexpected keyword arguments: `compute_on_step`
iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
Traceback (most recent call last):
  File "./tools/test.py", line 262, in <module>
    main()
  File "./tools/test.py", line 231, in main
    super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
ValueError: Unexpected keyword arguments: `compute_on_step`
    main()
  File "./tools/test.py", line 231, in main
    iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
    iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
ValueError: Unexpected keyword arguments: `compute_on_step`
super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
Traceback (most recent call last):
ValueError:   File "./tools/test.py", line 262, in <module>
Unexpected keyword arguments: `compute_on_step`
    main()
  File "./tools/test.py", line 231, in main
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
    super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
ValueError: Unexpected keyword arguments: `compute_on_step`
Traceback (most recent call last):
  File "./tools/test.py", line 262, in <module>
    main()
  File "./tools/test.py", line 231, in main
    outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,
  File "/app/projects/mmdet3d_plugin/uniad/apis/test.py", line 68, in custom_multi_gpu_test
    iou_metrics[key] = IntersectionOverUnion(n_classes).cuda()
  File "/app/projects/mmdet3d_plugin/uniad/dense_heads/occ_head_plugin/metrics.py", line 24, in __init__
    super().__init__(compute_on_step=compute_on_step)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 41, in __init__
    super().__init__(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torchmetrics/metric.py", line 150, in __init__
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
ValueError: Unexpected keyword arguments: `compute_on_step`
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 16428 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 16422) of binary: /root/.pyenv/versions/3.8.20/bin/python
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/distributed/run.py", line 798, in <module>
    main()
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

pip list

absl-py                  2.2.2
addict                   2.4.0
aiohappyeyeballs         2.4.4
aiohttp                  3.10.11
aiosignal                1.3.1
asttokens                3.0.0
async-timeout            5.0.1
attrs                    25.3.0
backcall                 0.2.0
black                    24.8.0
cachetools               5.5.2
casadi                   3.6.7
certifi                  2022.12.7
charset-normalizer       2.1.1
click                    8.1.8
cmake                    3.25.0
cycler                   0.12.1
decorator                5.2.1
descartes                1.1.0
einops                   0.8.1
exceptiongroup           1.3.0
executing                2.2.0
filelock                 3.13.1
fire                     0.7.0
flake8                   7.1.2
fonttools                4.57.0
frozenlist               1.5.0
fsspec                   2025.3.0
future                   1.0.0
google-api-core          2.25.0rc1
google-auth              2.40.2
google-auth-oauthlib     1.0.0
google-cloud-bigquery    3.30.0
google-cloud-core        2.4.3
google-crc32c            1.6.0.dev2
google-resumable-media   2.7.2
googleapis-common-protos 1.70.0
grpcio                   1.70.0
grpcio-status            1.70.0
idna                     3.4
imageio                  2.35.1
importlib_metadata       8.5.0
iniconfig                2.1.0
ipython                  8.12.3
jedi                     0.19.2
Jinja2                   3.1.3
joblib                   1.4.2
kiwisolver               1.4.7
lightning-utilities      0.11.9
lit                      15.0.7
llvmlite                 0.36.0
lyft-dataset-sdk         0.0.8
Markdown                 3.7
MarkupSafe               2.1.5
matplotlib               3.5.3
matplotlib-inline        0.1.7
mccabe                   0.7.0
mmcls                    0.25.0
mmcv-full                1.6.0
mmdet                    2.26.0
mmdet3d                  1.0.0rc6
mmsegmentation           0.29.1
motmetrics               1.1.3
mpmath                   1.3.0
multidict                6.1.0
mypy_extensions          1.1.0
narwhals                 1.41.0
networkx                 2.2
numba                    0.53.0
numpy                    1.22.4
nuscenes-devkit          1.1.11
oauthlib                 3.2.2
opencv-python            4.11.0.86
packaging                25.0
pandas                   1.2.2
parso                    0.8.4
pathspec                 0.12.1
pexpect                  4.9.0
pickleshare              0.7.5
pillow                   10.2.0
pip                      25.0.1
platformdirs             4.3.6
plotly                   6.1.1
pluggy                   1.5.0
plyfile                  1.0.3
prettytable              3.11.0
prompt_toolkit           3.0.51
propcache                0.2.0
proto-plus               1.26.1
protobuf                 5.29.4
ptyprocess               0.7.0
pure_eval                0.2.3
pyasn1                   0.6.1
pyasn1_modules           0.4.2
pycocotools              2.0.7
pycodestyle              2.12.1
pyDeprecate              0.3.2
pyflakes                 3.2.0
Pygments                 2.19.1
pyparsing                3.1.4
pyquaternion             0.9.9
pytest                   8.3.5
python-dateutil          2.9.0.post0
pytorch-lightning        1.2.5
pytz                     2025.2
PyWavelets               1.4.1
PyYAML                   6.0.2
requests                 2.28.1
requests-oauthlib        2.0.0
rsa                      4.9.1
scikit-image             0.19.3
scikit-learn             1.3.2
scipy                    1.10.1
setuptools               75.3.2
Shapely                  1.8.5.post1
six                      1.17.0
stack-data               0.6.3
sympy                    1.13.3
tensorboard              2.14.0
tensorboard-data-server  0.7.2
termcolor                2.4.0
terminaltables           3.1.10
threadpoolctl            3.5.0
tifffile                 2023.7.10
tomli                    2.2.1
torch                    2.0.1+cu118
torchaudio               2.0.2+cu118
torchmetrics             1.5.2
torchvision              0.15.2+cu118
tqdm                     4.67.1
traitlets                5.14.3
trimesh                  2.35.39
triton                   2.0.0
typing_extensions        4.9.0
tzdata                   2025.2
urllib3                  1.26.13
wcwidth                  0.2.13
Werkzeug                 3.0.6
wheel                    0.45.1
yapf                     0.40.1
yarl                     1.15.2
zipp                     3.20.2

ElfenSterben avatar May 27 '25 09:05 ElfenSterben

I encountered the same problem. May I ask if you have solved it?

baiyeha avatar Jun 16 '25 07:06 baiyeha

I encountered the same problem. May I ask if you have solved it?

Use tag v1.0.1, and install

ElfenSterben avatar Jun 16 '25 07:06 ElfenSterben

Thank you very much for your answer. This issue still occurs when the version is reduced to 1.0.1. There is no such error when using version 0.11.4, but an error message appears in stage2 inference. Have you encountered this problem before? Traceback (most recent call last): File "/opt/data/private/ADPlan/UniAD-2.0/./tools/test.py", line 262, in main() File "/opt/data/private/ADPlan/UniAD-2.0/./tools/test.py", line 231, in main outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir, File "/opt/data/private/ADPlan/UniAD-2.0/projects/mmdet3d_plugin/uniad/apis/test.py", line 102, in custom_multi_gpu_test planning_metrics(pred_sdc_traj[:, :6, :2], sdc_planning[0][0,:, :6, :2], sdc_planning_mask[0][0,:, :6, :2], segmentation[0][:, [1,2,3,4,5,6]]) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 234, in forward self._forward_cache = self._forward_full_state_update(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 247, in _forward_full_state_update self.update(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 400, in wrapped_func raise err File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 390, in wrapped_func update(*args, **kwargs) File "/opt/data/private/ADPlan/UniAD-2.0/projects/mmdet3d_plugin/uniad/dense_heads/planning_head_plugin/planning_metrics.py", line 136, in update obj_coll_sum, obj_box_coll_sum = self.evaluate_coll(trajs[:,:,:2], gt_trajs[:,:,:2], segmentation) File "/opt/data/private/ADPlan/UniAD-2.0/projects/mmdet3d_plugin/uniad/dense_heads/planning_head_plugin/planning_metrics.py", line 111, in evaluate_coll obj_coll_sum[ti[m1]] += segmentation[i, ti[m1], yi[m1], xi[m1]].long() RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 30807) of binary: /root/anaconda3/envs/uniad2.0/bin/python Traceback (most recent call last): File "/root/anaconda3/envs/uniad2.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/uniad2.0/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/run.py", line 798, in main() File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

./tools/test.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2025-06-16_15:42:47 host : fire rank : 0 (local_rank: 0) exitcode : 1 (pid: 30807) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

baiyeha avatar Jun 16 '25 08:06 baiyeha

Thank you very much for your answer. This issue still occurs when the version is reduced to 1.0.1. There is no such error when using version 0.11.4, but an error message appears in stage2 inference. Have you encountered this problem before?

Traceback (most recent call last): File "/opt/data/private/ADPlan/UniAD-2.0/./tools/test.py", line 262, in main() File "/opt/data/private/ADPlan/UniAD-2.0/./tools/test.py", line 231, in main outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir, File "/opt/data/private/ADPlan/UniAD-2.0/projects/mmdet3d_plugin/uniad/apis/test.py", line 102, in custom_multi_gpu_test planning_metrics(pred_sdc_traj[:, :6, :2], sdc_planning[0][0,:, :6, :2], sdc_planning_mask[0][0,:, :6, :2], segmentation[0][:, [1,2,3,4,5,6]]) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 234, in forward self._forward_cache = self._forward_full_state_update(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 247, in _forward_full_state_update self.update(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 400, in wrapped_func raise err File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torchmetrics/metric.py", line 390, in wrapped_func update(*args, **kwargs) File "/opt/data/private/ADPlan/UniAD-2.0/projects/mmdet3d_plugin/uniad/dense_heads/planning_head_plugin/planning_metrics.py", line 136, in update obj_coll_sum, obj_box_coll_sum = self.evaluate_coll(trajs[:,:,:2], gt_trajs[:,:,:2], segmentation) File "/opt/data/private/ADPlan/UniAD-2.0/projects/mmdet3d_plugin/uniad/dense_heads/planning_head_plugin/planning_metrics.py", line 111, in evaluate_coll obj_coll_sum[ti[m1]] += segmentation[i, ti[m1], yi[m1], xi[m1]].long() RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 30807) of binary: /root/anaconda3/envs/uniad2.0/bin/python Traceback (most recent call last): File "/root/anaconda3/envs/uniad2.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/uniad2.0/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/run.py", line 798, in main() File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/anaconda3/envs/uniad2.0/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

./tools/test.py FAILED

Failures:

<NO_OTHER_FAILURES> Root Cause (first observed failure): [0]: time : 2025-06-16_15:42:47 host : fire rank : 0 (local_rank: 0) exitcode : 1 (pid: 30807) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Sorry, I got I run it with main branch You must strictly follow the versions specified in the install.md file. Here's the Dockerfile for your reference.

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04


ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES},compute,display

SHELL [ "/bin/bash", "--login", "-c" ]

# To fix GPG key error when running apt-get update
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub

#install libs first
RUN apt-get update -q && \
    apt-get install -q -y \
    wget \
    git \
    ninja-build \
    ffmpeg libsm6 libxext6 libglib2.0-0 libsm6 libxrender-dev libxext6

RUN apt-get update && apt install -y --no-install-recommends vim netbase tzdata dpkg-dev gcc \
    gnupg libbluetooth-dev libbz2-dev libc6-dev libdb-dev libexpat1-dev \
    libffi-dev libgdbm-dev liblzma-dev libncursesw5-dev libreadline-dev \
    libsqlite3-dev libssl-dev make tk-dev uuid-dev wget xz-utils zlib1g-dev git sudo g++ \
    autoconf automake libtool make cmake unzip python-is-python3

RUN apt-get clean \ && rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/pyenv/pyenv.git ~/.pyenv
RUN cd ~/.pyenv && src/configure && make -C src
RUN echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
RUN echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
RUN echo 'eval "$(pyenv init - bash)"' >> ~/.bashrc

ENV PYENV_ROOT="/root/.pyenv"
ENV PATH="$PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH"
ENV PYTHON_VERSION=3.8.20

RUN mkdir -p $PYENV_ROOT/cache
RUN wget -O $PYENV_ROOT/cache/Python-$PYTHON_VERSION.tar.xz "https://mirror.nju.edu.cn/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz"
RUN echo $PATH
RUN pyenv install $PYTHON_VERSION
RUN pyenv global $PYTHON_VERSION

# intall pytorch
RUN pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install IPython wheel

# Install MMCV-series
ENV FORCE_CUDA="1"
ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0+PTX"
ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
RUN pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.0/index.html
RUN pip install --upgrade pip setuptools
RUN pip install mmdet==2.14.0 mmsegmentation==0.14.1
RUN pip install scipy==1.7.3
RUN pip install scikit-image==0.20.0
# Install UniAD from source
WORKDIR /app
COPY requirements.txt /app
RUN pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
COPY mmdetection3d-0.17.1 /app/mmdetection3d-0.17.1
RUN ls /app
RUN cd /app/mmdetection3d-0.17.1 && pip install -v .
RUN rm -rf /app/mmdetection3d-0.17.1
RUN pip install numpy==1.22.4

ElfenSterben avatar Jun 16 '25 08:06 ElfenSterben

Thank you very much for your answer

The error is caused by the fact that in the evaluate_coll method, the index tensor (ti[m1], yi[m1], xi[m1]) and the segmentation tensor are not on the same device. It is necessary to transfer xi,yi and ti from the cpu to the gpu as well

baiyeha avatar Jun 16 '25 12:06 baiyeha

Thank you very much for your answer

The error is caused by the fact that in the evaluate_coll method, the index tensor (ti[m1], yi[m1], xi[m1]) and the segmentation tensor are not on the same device. It is necessary to transfer xi,yi and ti from the cpu to the gpu as well

So how to fix this? Maybe we just need to change some code?

DK-DARKmatter avatar Jun 19 '25 03:06 DK-DARKmatter

Thank you very much for your answer The error is caused by the fact that in the evaluate_coll method, the index tensor (ti[m1], yi[m1], xi[m1]) and the segmentation tensor are not on the same device. It is necessary to transfer xi,yi and ti from the cpu to the gpu as well

So how to fix this? Maybe we just need to change some code?

I encountered the same problem. I transferred ti to the gpu and it work."ti = torch.arange(n_future, device=segmentation.device)"

rookie0109 avatar Jun 22 '25 14:06 rookie0109

Thank you very much for your answer The error is caused by the fact that in the evaluate_coll method, the index tensor (ti[m1], yi[m1], xi[m1]) and the segmentation tensor are not on the same device. It is necessary to transfer xi,yi and ti from the cpu to the gpu as well

So how to fix this? Maybe we just need to change some code?

I encountered the same problem. I transferred ti to the gpu and it work."ti = torch.arange(n_future, device=segmentation.device)"

good solution! this fix my same question.

KJsouth avatar Jul 17 '25 13:07 KJsouth

compute_on_step parameter error ---> The reason is that pytorch_lightning==1.2.5 by default passes the compute_on_step parameter, which is incompatible with the torchmetrics version

NOTE: Downgrading torchmetrics has lower risk than upgrading pytorch_lightning. Currently, we choose to downgrade torchmetrics to version 0.6.2 first

zzh-yun avatar Oct 13 '25 11:10 zzh-yun