vortx
vortx copied to clipboard
Training does not detect GPUs
Hi, I'm stuck at this for hours, and every try is a pain, just by the fact that every conda build and installation takes so long.
My main issue is that I followed all the steps to install all dependencies needed for this project, but when I try to start the training process
python scripts/train.py --config config.yml
I just get this:
home/darkayserleo/anaconda3/envs/vortx2/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/darkayserleo/anaconda3/envs/vortx2/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet1_0_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet1_0_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Traceback (most recent call last):
File "/media/darkayserleo/Data/vortx/scripts/train.py", line 60, in <module>
trainer = pl.Trainer(
File "/home/darkayserleo/anaconda3/envs/vortx2/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 38, in insert_env_defaults
return fn(self, **kwargs)
File "/home/darkayserleo/anaconda3/envs/vortx2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 426, in __init__
gpu_ids, tpu_cores = self._parse_devices(gpus, auto_select_gpus, tpu_cores)
File "/home/darkayserleo/anaconda3/envs/vortx2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1525, in _parse_devices
gpu_ids = device_parser.parse_gpu_ids(gpus)
File "/home/darkayserleo/anaconda3/envs/vortx2/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 89, in parse_gpu_ids
return _sanitize_gpu_ids(gpus)
File "/home/darkayserleo/anaconda3/envs/vortx2/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 151, in _sanitize_gpu_ids
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0]
But your machine only has: []
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run glorious-valley-7 at: https://wandb.ai/leonelos/vortx/runs/yd5za7au
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230705_192250-yd5za7au/logs
I tried by using pytorch 1.4 and it throws a lot of incompatibilities...
The instructions on the readme are installing a newest version of Pytorch I mean 2.0, perhaps that's the reason why pytorch_lightning is not working.
Can you help me please. I've had to stop working on this because I didn't have the hardware to run the code a few months ago, now I got a better hardware setup and I managed to run the format scannet to vortex, I run the tsdf build, but now when I want to train everything I got stuck.
Also when I tried to fix this issue I delete my old vortx conda env (which I built months ago) with everything working and now when I try to run all the previous steps like format from scannet to vortx and building the stdf it stopped working, it's a complete mess.
Now my vortx env is broken, I can't do nothing, please I really need a hand with this.
My assumption is that some of the dependencies are installing their newest version, but the ones you are specifing:
pytorch-lightning==1.5 scikit-image==0.18 pip install git+https://github.com/mit-han-lab/[email protected]
are staying in those version, perhaps there is some kind of incompatibility. Please help!
EDITED: I tried installing everything again in a new conda env called vortx2, then I added the following lines to .bashrc
export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH export PATH=$CUDA_HOME/bin:$PATH
and generate_gt is working as expected.
But still
python scripts/train.py --config config.yml
is not working
Just in addition, these are my conda requeriments, newest one I installed
name: vortx2
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_kmp_llvm
- blas=2.116=mkl
- blas-devel=3.9.0=16_linux64_mkl
- brotli-python=1.0.9=py39h5a03fae_9
- bzip2=1.0.8=h7f98852_4
- ca-certificates=2023.5.7=hbcca054_0
- certifi=2023.5.7=pyhd8ed1ab_0
- charset-normalizer=3.1.0=pyhd8ed1ab_0
- cudatoolkit=11.3.1=h9edb442_11
- ffmpeg=4.3=hf484d3e_0
- filelock=3.12.2=pyhd8ed1ab_0
- freetype=2.12.1=hca18f0e_1
- gmp=6.2.1=h58526e2_0
- gmpy2=2.1.2=py39h376b7d2_1
- gnutls=3.6.13=h85f3911_1
- icu=72.1=hcb278e6_0
- idna=3.4=pyhd8ed1ab_0
- jinja2=3.1.2=pyhd8ed1ab_1
- jpeg=9e=h0b41bf4_3
- lame=3.100=h166bdaf_1003
- lcms2=2.15=hfd0df8a_0
- ld_impl_linux-64=2.40=h41732ed_0
- lerc=4.0.0=h27087fc_0
- libblas=3.9.0=16_linux64_mkl
- libcblas=3.9.0=16_linux64_mkl
- libdeflate=1.17=h0b41bf4_0
- libffi=3.4.2=h7f98852_5
- libgcc-ng=13.1.0=he5830b7_0
- libgfortran-ng=13.1.0=h69a702a_0
- libgfortran5=13.1.0=h15d22d2_0
- libgomp=13.1.0=he5830b7_0
- libhwloc=2.9.1=nocuda_h7313eea_6
- libiconv=1.17=h166bdaf_0
- liblapack=3.9.0=16_linux64_mkl
- liblapacke=3.9.0=16_linux64_mkl
- libnsl=2.0.0=h7f98852_0
- libpng=1.6.39=h753d276_0
- libsqlite=3.42.0=h2797004_0
- libstdcxx-ng=13.1.0=hfd8a6a1_0
- libtiff=4.5.0=h6adf6a1_2
- libuuid=2.38.1=h0b41bf4_0
- libwebp-base=1.3.1=hd590300_0
- libxcb=1.13=h7f98852_1004
- libxml2=2.11.4=h0d562d8_0
- libzlib=1.2.13=hd590300_5
- llvm-openmp=16.0.6=h4dfa4b3_0
- markupsafe=2.1.3=py39hd1e30aa_0
- mkl=2022.1.0=h84fe81f_915
- mkl-devel=2022.1.0=ha770c72_916
- mkl-include=2022.1.0=h84fe81f_915
- mpc=1.3.1=hfe3b2da_0
- mpfr=4.2.0=hb012696_0
- mpmath=1.3.0=pyhd8ed1ab_0
- ncurses=6.4=hcb278e6_0
- nettle=3.6=he412f7d_0
- networkx=3.1=pyhd8ed1ab_0
- openh264=2.1.1=h780b84a_0
- openjpeg=2.5.0=hfec8fc6_2
- openssl=3.1.1=hd590300_1
- pillow=9.4.0=py39h2320bf1_1
- pip=23.1.2=pyhd8ed1ab_0
- pthread-stubs=0.4=h36c2ea0_1001
- pysocks=1.7.1=pyha2e5f31_6
- python=3.9.16=h2782a2a_0_cpython
- python_abi=3.9=3_cp39
- pytorch=2.0.1=py3.9_cpu_0
- pytorch-mutex=1.0=cpu
- readline=8.2=h8228510_1
- requests=2.31.0=pyhd8ed1ab_0
- setuptools=68.0.0=pyhd8ed1ab_0
- sympy=1.12=pypyh9d50eac_103
- tbb=2021.9.0=hf52228f_0
- tk=8.6.12=h27826a3_0
- torchvision=0.15.2=py39_cpu
- typing_extensions=4.7.1=pyha770c72_0
- wheel=0.40.0=pyhd8ed1ab_0
- xorg-libxau=1.0.11=hd590300_0
- xorg-libxdmcp=1.1.3=h7f98852_0
- xz=5.2.6=h166bdaf_0
- zlib=1.2.13=hd590300_5
- zstd=1.5.2=h3eb15da_6
- pip:
- absl-py==1.4.0
- addict==2.4.0
- aiohttp==3.8.4
- aiosignal==1.3.1
- ansi2html==1.8.0
- appdirs==1.4.4
- asttokens==2.2.1
- async-timeout==4.0.2
- attrs==23.1.0
- backcall==0.2.0
- black==23.3.0
- cachetools==5.3.1
- click==8.1.3
- comm==0.1.3
- configargparse==1.5.5
- contourpy==1.1.0
- cycler==0.11.0
- dash==2.11.1
- dash-core-components==2.0.0
- dash-html-components==2.0.0
- dash-table==5.0.0
- debugpy==1.6.7
- decorator==5.1.1
- docker-pycreds==0.4.0
- executing==1.2.0
- fastjsonschema==2.17.1
- flask==2.2.5
- fonttools==4.40.0
- freetype-py==2.4.0
- frozenlist==1.3.3
- fsspec==2023.6.0
- future==0.18.3
- gitdb==4.0.10
- gitpython==3.1.31
- google-auth==2.21.0
- google-auth-oauthlib==1.0.0
- grpcio==1.51.3
- imageio==2.31.1
- importlib-metadata==6.7.0
- importlib-resources==5.12.0
- ipykernel==6.24.0
- ipython==8.14.0
- ipywidgets==8.0.7
- itsdangerous==2.1.2
- jedi==0.18.2
- joblib==1.3.1
- jsonschema==4.17.3
- jupyter-client==8.3.0
- jupyter-core==5.3.1
- jupyterlab-widgets==3.0.8
- kiwisolver==1.4.4
- lightning-utilities==0.9.0
- llvmlite==0.40.1
- mako==1.2.4
- markdown==3.4.3
- matplotlib==3.7.2
- matplotlib-inline==0.1.6
- msgpack==1.0.5
- multidict==6.0.4
- mypy-extensions==1.0.0
- nbformat==5.7.0
- nest-asyncio==1.5.6
- numba==0.57.1
- numpy==1.24.4
- oauthlib==3.2.2
- open3d==0.17.0
- opencv-python==4.8.0.74
- packaging==23.1
- pandas==2.0.3
- parso==0.8.3
- pathspec==0.11.1
- pathtools==0.1.2
- pexpect==4.8.0
- pickleshare==0.7.5
- platformdirs==3.8.0
- plotly==5.15.0
- prompt-toolkit==3.0.39
- protobuf==4.23.3
- psutil==5.9.5
- ptyprocess==0.7.0
- pure-eval==0.2.2
- pyasn1==0.5.0
- pyasn1-modules==0.3.0
- pycuda==2022.2.2
- pydeprecate==0.3.1
- pyglet==2.0.8
- pygments==2.15.1
- pyopengl==3.1.0
- pyparsing==3.0.9
- pyquaternion==0.9.9
- pyrender==0.1.45
- pyrsistent==0.19.3
- python-dateutil==2.8.2
- pytools==2023.1
- pytorch-lightning==1.5.0
- pytz==2023.3
- pywavelets==1.4.1
- pyyaml==6.0
- pyzmq==25.1.0
- ray==2.5.1
- requests-oauthlib==1.3.1
- retrying==1.3.4
- rsa==4.9
- scikit-image==0.18.0
- scikit-learn==1.3.0
- scipy==1.11.1
- sentry-sdk==1.27.0
- setproctitle==1.3.2
- six==1.16.0
- smmap==5.0.0
- stack-data==0.6.2
- tenacity==8.2.2
- tensorboard==2.13.0
- tensorboard-data-server==0.7.1
- threadpoolctl==3.1.0
- tifffile==2023.7.4
- tomli==2.0.1
- torchmetrics==1.0.0
- torchsparse==1.4.0
- tornado==6.3.2
- tqdm==4.65.0
- traitlets==5.9.0
- trimesh==3.22.3
- tzdata==2023.3
- urllib3==1.26.16
- wandb==0.15.5
- wcwidth==0.2.6
- werkzeug==2.2.3
- widgetsnbextension==4.0.8
- yarl==1.9.2
- zipp==3.15.0
prefix: /home/darkayserleo/anaconda3/envs/vortx2
Can you share with me your requeriments.yml
perhaps using the same as yours, can help me to solve the issue.
No luck, I broke my linux system trying to remove and reinstall drivers, then I installed Ubuntu 22 from zero.
Then I did the following steps:
- sudo apt-get install nvidia-driver-535
- sudo apt-get install git
- git clone https://github.com/noahstier/vortx.git
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
- bash ~/miniconda.sh -b
- rm ~/miniconda.sh
- conda create -n vortx python=3.9 -y
- conda activate vortx
- conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
- sudo apt-get install nvidia-cuda-toolkit
- pip install
pytorch-lightning==1.5
scikit-image==0.18
numba
pillow
wandb
tqdm
open3d
pyrender
ray
trimesh
pyyaml
matplotlib
black
pycuda
opencv-python
imageio - sudo apt install libsparsehash-dev
- pip install git+https://github.com/mit-han-lab/[email protected]
- pip install -e .
I installed the lastest nvidia driver, and when I run:
python scripts/train.py --config config.yml
I get:
Global seed set to 0
wandb: Currently logged in as: ********* Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/darkayserleo/vortx/wandb/run-20230706_145359-5w0pfpzf
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run rare-gorge-24
wandb: ⭐️ View project at https://wandb.ai/leonelos/vortx
wandb: 🚀 View run at https://wandb.ai/leonelos/vortx/runs/5w0pfpzf
/home/darkayserleo/miniconda3/envs/vortx/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/darkayserleo/miniconda3/envs/vortx/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet1_0_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet1_0_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Traceback (most recent call last):
File "/home/darkayserleo/vortx/scripts/train.py", line 58, in <module>
trainer = pl.Trainer(
File "/home/darkayserleo/miniconda3/envs/vortx/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 38, in insert_env_defaults
return fn(self, **kwargs)
File "/home/darkayserleo/miniconda3/envs/vortx/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 426, in __init__
gpu_ids, tpu_cores = self._parse_devices(gpus, auto_select_gpus, tpu_cores)
File "/home/darkayserleo/miniconda3/envs/vortx/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1525, in _parse_devices
gpu_ids = device_parser.parse_gpu_ids(gpus)
File "/home/darkayserleo/miniconda3/envs/vortx/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 89, in parse_gpu_ids
return _sanitize_gpu_ids(gpus)
File "/home/darkayserleo/miniconda3/envs/vortx/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 151, in _sanitize_gpu_ids
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0]
But your machine only has: []
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run rare-gorge-24 at: https://wandb.ai/leonelos/vortx/runs/5w0pfpzf
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230706_145359-5w0pfpzf/logs
still saying that I have no gpus detected
pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0]
But your machine only has: []
nvidia-smi
Thu Jul 6 14:54:27 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:26:00.0 Off | N/A |
| 0% 53C P8 18W / 170W | 9MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3060 Off | 00000000:27:00.0 On | N/A |
| 54% 53C P8 21W / 170W | 502MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1903 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1903 G /usr/lib/xorg/Xorg 170MiB |
| 1 N/A N/A 2247 G /usr/bin/gnome-shell 138MiB |
| 1 N/A N/A 5316 G ...irefox/2356/usr/lib/firefox/firefox 180MiB |
+---------------------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
What do I need to do to run your code? please
I'm feeling frustrated. I don't know what else I need to do