dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

clean install of conda env create -f conda-extras.yaml on Ubuntu fails to install cuml-cu11, any ideas?

Open lovettchris opened this issue 1 year ago • 12 comments

failes with this error

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: - Ran pip subprocess with arguments:
['/home/smartreplayuser/miniconda3/envs/dinov2-extras/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/smartreplayuser/git/Facebook/dinov2/condaenv.slco21xq.requirements.txt', '--exists-action=b']
Pip subprocess output:
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting git+https://github.com/facebookincubator/submitit (from -r /home/smartreplayuser/git/Facebook/dinov2/condaenv.slco21xq.requirements.txt (line 1))
  Cloning https://github.com/facebookincubator/submitit to /tmp/pip-req-build-u5ykxk6q
  Resolved https://github.com/facebookincubator/submitit to commit 07f21fa1234e34151874c00d80c345e215af4967
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Collecting cuml-cu11 (from -r /home/smartreplayuser/git/Facebook/dinov2/condaenv.slco21xq.requirements.txt (line 3))
  Downloading cuml-cu11-23.12.0.tar.gz (6.8 kB)
  Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'

Pip subprocess error:
  Running command git clone --filter=blob:none --quiet https://github.com/facebookincubator/submitit /tmp/pip-req-build-u5ykxk6q
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
╰─> [16 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-j61__ee5/cuml-cu11_45092e6783154e1f9c70d31f80db8581/setup.py", line 137, in <module>
raise RuntimeError(open("ERROR.txt", "r").read())
RuntimeError:
###########################################################################################
The package you are trying to install is only a placeholder project on PyPI.org repository.
This package is hosted on NVIDIA Python Package Index.

This package can be installed as:

$ pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com cuml-cu11

###########################################################################################

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
fa
iled

CondaEnvException: Pip failed

And the recommended fix pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com cuml-cu11 fails with the same error.

lovettchris avatar Dec 09 '23 03:12 lovettchris

same error, used to work well but fails since today.

clemsgrs avatar Dec 09 '23 23:12 clemsgrs

also same error here: FWIW Ubuntu 22.04 with CUDA 12.2 and 535.129.03 NVidia Driver... replacing "cuml-cu11" with "cuml-cu12" did not work

Florian2Richter avatar Dec 10 '23 10:12 Florian2Richter

You may try to install an older version like this: $ pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com cuml-cu11==23.10.0

kuma94506 avatar Dec 10 '23 11:12 kuma94506

that works, thanks.

lovettchris avatar Dec 11 '23 16:12 lovettchris

interesting, for me pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com/ cuml-cu11==23.10.0 didn't work either (same error ; tried older versions too)

clemsgrs avatar Dec 11 '23 16:12 clemsgrs

Seems to be fixed by now... usual pip install -r requirements.txt worked fine.

Florian2Richter avatar Dec 12 '23 16:12 Florian2Richter

Interesting, Florian, can you post the version of CUDA, pytorch and Python that you are using?

lovettchris avatar Dec 12 '23 18:12 lovettchris

Sure, in my virtual environment (venv, not conda) is Python 3.10.12, PyTorch 2.0.0+cu117 and NVIDIA 535.129.03 with CUDA 12.2

Florian2Richter avatar Dec 13 '23 16:12 Florian2Richter

issues seems fixed for me. fyi I'm using the conda installation.

clemsgrs avatar Dec 13 '23 18:12 clemsgrs

@Florian2Richter interestingly conda.yaml and conda-extras.yaml contains python 3.9.

In order to get the dinov2 segmentation head working on Ubuntu I had to build mmcv from source using MMCV_WITH_OPS=1 pip install -e . which required a newer version of GCC that support C++17, and then I could get the segmentation head working on CUDA, and measured about 4 seconds per inference on a Tesla T4 GPU using small backbone dinov2_vits14. I also had to install ftfy and regex pip packages. For me the "pip install mmcv-full==1.5.0" results in the error:

ModuleNotFoundError: No module named 'mmcv._ext'

lovettchris avatar Dec 13 '23 18:12 lovettchris

I can confirm that installing mmcv from the source fixed the issue when running segmentation scripts. You need to clone the specific version (not just main branch) via the following command git clone https://github.com/open-mmlab/mmcv.git --branch v1.5.3 --single-branch and install nvcc v11.7 if needed before building mmcv (conda install -c conda-forge cudatoolkit-dev=11.7). For the regular pip install I get the same error as @lovettchris.

@lovettchris, have you tried to reproduce segmentation results? I am a little bit lost about patch size: all dinov2 backbones have patch size equal to 14, however in the segmentation evaluation it is assumed to be 16 (>It is used to produce a low-resolution logit map (eg 32x32 for a model with patch size 16) and the input image size is 512, which is divisible by 16 but not 14). After modifying the config for patch size equal to 14, I can partly reproduce results for ADE20k, but not for Pascal VOC.

EDIT (22.02.24): I've managed to reproduce results both for ADE20k and Pascal VOC. Don't forget to override init_weights() method for your backbone. It is not enough to load checkpoint weights during the constructor call. Otherwise, during segmentation training weights can be overridden by default weights initialization (source).

Your backbone (dinov2/eval/segmentation/models/backbones/vision_transformer.py) should look similar to this.

Feel free to ping me if you have some issues.

bruce-willis avatar Feb 15 '24 14:02 bruce-willis

I had several issues setting up the environment for segmentation properly.

I did the following:

  • I have CUDA 11.7 and Python 3.9.12
  • I edited the requirements.txt to specify the version of cuml-cu11: cuml-cu11==23.10.0 as explained by @kuma94506
  • Installed the requirements.txt : pip3 install -r requirements.txt. I didn't install therequirements-extra.txt: instead, I followe the next steps.
  • As explained by @lovettchris and @bruce-willis , I compiled mmcv-full from scratch:
git clone https://github.com/open-mmlab/mmcv.git --branch v1.5.0 --single-branch
cd mmcv/
MMCV_WITH_OPS=1 pip install -e .
  • I Installed mmsegmentation using: pip3 install mmsegmentation==0.27.0
  • I installed opencv: pip3 install opencv-python

I still get error: ModuleNotFoundError: No module named 'mmcv.ops'. @lovettchris @bruce-willis how did you fix this? Thanks

mfoglio avatar Jun 28 '24 20:06 mfoglio