smplify-x icon indicating copy to clipboard operation
smplify-x copied to clipboard

How to fix "Cuda failure src/bvh_cuda_op.cu:877"?

Open ZiliPeng opened this issue 6 years ago • 27 comments

This is a really great work!!! Is anybody get an error when running the script, and how to fix it? (CUDA version:V9.0.176; cudnn version:7.0.5) The error is as follows:

Error inside sort: radix_sort: failed on 2nd step: invalid argument████████████████████▌                                                  | 3/5 [00:37<00:25, 12.84s/it]
Cuda failure src/bvh_cuda_op.cu:877: 'invalid argument'

ZiliPeng avatar Jun 17 '19 08:06 ZiliPeng

Similar error! image

Arthur151 avatar Jun 17 '19 09:06 Arthur151

Similar error!

guolinjie007 avatar Jun 20 '19 03:06 guolinjie007

similar error!

torpor29 avatar Jun 20 '19 11:06 torpor29

similar problem CUDA 9.0

wangsen1312 avatar Jun 21 '19 09:06 wangsen1312

Anyone solved this problem?

gsygsy96 avatar Jun 22 '19 08:06 gsygsy96

it works fine with cuda10

tszhang97 avatar Jun 22 '19 08:06 tszhang97

it works fine with cuda10

you are using cuda10.1 or 10.0 ?

CosmicWebCreator avatar Jun 25 '19 15:06 CosmicWebCreator

same problem... Tried with 10.0 and 10.1 and still getting the error on step 3. Could it be graphic card size (using 1080). Issue seems to happen in torch-mesh-isect project....

line 844 : thrust::sort_by_key(morton_codes.begin(), morton_codes.end(), triangle_ids->begin());

Error inside sort: radix_sort: failed on 2nd step: invalid argument Cuda failure src/bvh_cuda_op.cu:877: 'an illegal memory access was encountered'

CosmicWebCreator avatar Jun 25 '19 16:06 CosmicWebCreator

Hi everyone, sorry for the delay. Could you please provide me some samples inputs (images + keypoints) that cause the problem (e-mail, comment, whatever suits you best) ? I will try to reproduce it on my machine and figure out a solution.

vchoutas avatar Jun 27 '19 15:06 vchoutas

Hi everyone, sorry for the delay. Could you please provide me some samples inputs (images + keypoints) that cause the problem (e-mail, comment, whatever suits you best) ? I will try to reproduce it on my machine and figure out a solution.

here is the asked input :)

D.zip

CosmicWebCreator avatar Jun 27 '19 15:06 CosmicWebCreator

use system python, not conda

/usr/bin/python3 -m venv venv

I encountered the same issue. Then the problem has been fixed after creating python environment by system python, NOT from anaconda.

kazukiotsuka avatar Jun 30 '19 05:06 kazukiotsuka

/usr/bin/python3 -m venv venv

I encountered the same issue. Then the problem has been fixed after creating python environment by system python, NOT from anaconda.

Thank you very much!! I solved this problem using your method! Thank you!

torpor29 avatar Jul 02 '19 03:07 torpor29

@mikeburon The given data runs without any issue for me. Could you try @torpor29 solution?

vchoutas avatar Jul 02 '19 08:07 vchoutas

I will give it a try today will re after thanks for the time youve invested :)

On Tue., Jul. 2, 2019, 04:06 Vassilis Choutas, [email protected] wrote:

@mikeburon https://github.com/mikeburon The given data runs without any issue for me. Could you try @torpor29 https://github.com/torpor29 solution?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vchoutas/smplify-x/issues/3?email_source=notifications&email_token=AE4TWEGUKD4KACPPOHT6IQLP5MEARA5CNFSM4HYUGOQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZAOC4A#issuecomment-507568496, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4TWEH5AH5JVH6P4MT7NODP5MEARANCNFSM4HYUGOQA .

CosmicWebCreator avatar Jul 02 '19 13:07 CosmicWebCreator

Use virtualenv not conda. Use a clean environment. Create new virtual environment is better.

/usr/bin/python3 -m venv venv

This method works on my environment (Ubuntu 18.04, cuda 10.0, cudnn 7.5.0.56, pytorch 1.1.0). Before that I tried change cudnn version 7.3 to 7.5, pytorch version 1.0 to 1.1 in conda environment, but not work.

Here is my command history for installing smplx(You'd better install openpose at another environment.):

venv_dir=~/venvs/smplify-x
python3 -m venv $venv_dir
source ~/venvs/smplify-x/bin/activate
pip install -U pip
pip install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
pip install smplx
pip install human_body_prior
rm -fr homogenus
git clone https://github.com/nghorbani/homogenus.git
cd homogenus
python setup.py install
cd ..
rm torch-mesh-isect -fr
git clone https://github.com/vchoutas/torch-mesh-isect
cd torch-mesh-isect/
export CUDA_SAMPLES_INC=~/NVIDIA_CUDA-10.0_Samples/common/inc/
python setup.py install
pip install trimesh
pip install pyrender
pip install pyaml
pip install tqdm
pip install configargparse
pip install shapely

Note delete old build files before run python setup.py install. I just delete the repository dir and clone it agin. My cuda and cudnn directory hierarchical is :

├── cuda -> /usr/local/cuda-10.0 ├── cuda-10.0 ├── cudnn -> /usr/local/cudnn-7.5.0.56 ├── cudnn-7.3.1 ├── cudnn-7.5.0.56

PS. Running smplifyx/main.py not depends on cudnn. It still worked well after I deleted cudnn soft link.

huhai463127310 avatar Jul 02 '19 13:07 huhai463127310

So i still get the same error. The only parameter that I see that is different is that I am using Ubuntu 16 and not 18. Will try with 18 later.

CosmicWebCreator avatar Jul 02 '19 15:07 CosmicWebCreator

So i still get the same error. The only parameter that I see that is different is that I am using Ubuntu 16 and not 18. Will try with 18 later.

Try to delete old build files, create a new virtual environment, then install all package in the new virtual environment. Maybe it will work.

huhai463127310 avatar Jul 03 '19 16:07 huhai463127310

Use virtualenv not conda. Use a clean environment. Create new virtual environment is better.

/usr/bin/python3 -m venv venv

This method works on my environment (Ubuntu 18.04, cuda 10.0, cudnn 7.5.0.56, pytorch 1.1.0). Before that I tried change cudnn version 7.3 to 7.5, pytorch version 1.0 to 1.1 in conda environment, but not work.

Here is my command history for installing smplx(You'd better install openpose at another environment.):

venv_dir=~/venvs/smplify-x
python3 -m venv $venv_dir
source ~/venvs/smplify-x/bin/activate
pip3 install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
pip install smplx
pip install human_body_prior
rm -fr homogenus
git clone https://github.com/nghorbani/homogenus.git
cd homogenus
python setup.py install
cd ..
rm torch-mesh-isect -fr
git clone https://github.com/vchoutas/torch-mesh-isect
cd torch-mesh-isect/
export CUDA_SAMPLES_INC=~/NVIDIA_CUDA-10.0_Samples/common/inc/
python setup.py install
pip install trimesh
pip install pyrender
pip install pyaml
pip install tqdm
pip install configer
pip install shapely

Note delete old build files before run python setup.py install. I just delete the repository dir and clone it agin. My cuda and cudnn directory hierarchical is :

├── cuda -> /usr/local/cuda-10.0 ├── cuda-10.0 ├── cudnn -> /usr/local/cudnn-7.5.0.56 ├── cudnn-7.3.1 ├── cudnn-7.5.0.56

PS. Running smplifyx/main.py not depends on cudnn. It still worked well after I deleted cudnn soft link.

amazinnngg!

TO ALL :

make sure to use the same installs (ubuntu 16 didnt work for me but 18 did!)

CosmicWebCreator avatar Jul 04 '19 03:07 CosmicWebCreator

Did anyone manage to make this work with Cuda 9?

Tetsujinfr avatar Aug 03 '19 23:08 Tetsujinfr

@geopavlakos @ZiliPeng @vchoutas @mikeburon Hi, I face similar problem but with slightly different error messages. I've followed link and created a virtual enviroment with /usr/bin/python3, but it didn't fix my problem. Can you help me check it out?

Error Message

(smplifyx) vcl@vcl-dl-2:/backup1/lingboyang/SMPLeXpressive_package/smplify-x$ bash fit.sh
Processing: test_images/images/01_img.jpg
Found Trained Model: /backup1/lingboyang/SMPLeXpressive_package/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 1.4820
Camera initialization final loss 3699.7092
Stage 000 done after 4.3845 seconds
Stage 001 done after 2.9126 seconds
Stage 002 done after 8.0329 seconds
Orientation:   0%|                                                                                                                                          | 0/1 [00:15<?, ?it/sError inside sort: radix_sort: failed on 2nd step: invalid argument███████████████████████████▌                                                      | 3/5 [00:15<00:10,  5.17s/it]
Cuda failure src/bvh_cuda_op.cu:877: 'an illegal memory access was encountered'

Fitting script

export CUDA_VISIBLE_DEVICES=0
python smplifyx/main.py --config cfg_files/fit_smplx.yaml --data_folder test_images --output_folder results --visualize="False" --model_folder /backup1/lingboyang/SMPLeXpressive_package/models_smplx_v1_0/models/smplx/SMPLX_NEUTRAL.npz --vposer_ckpt /backup1/lingboyang/SMPLeXpressive_package/vposer_v1_0 --part_segm_fn smplx_parts_segm.pkl

CONFIG

Pytorch 1.1.0 CUDA 9.0.176 CUDNN 7.2.1 Python 3.6.8 GCC 5.5.0

Testing images

test_images.zip Note: This is frame 1,11,21 from EPH dataset

Any progress on this issue? Thanks!

Lotayou avatar Aug 06 '19 14:08 Lotayou

@Lotayou this is running without any issue for me. Might it be because you are using CUDA 9.0 instead of CUDA 10?

vchoutas avatar Aug 09 '19 10:08 vchoutas

@vchoutas Cheers. I will switch to CUDA 10 and try it again. Will keep you notified in case of any new issues.

Lotayou avatar Aug 10 '19 14:08 Lotayou

@vchoutas Cheers. I will switch to CUDA 10 and try it again. Will keep you notified in case of any new issues.

@vchoutas I've installed CUDA 10.1 on my machine and changed the soft link /usr/local/cuda to 10.1 folder. Then I followed exact steps as before, but the error still persists... Can you think of other possibilities that may cause issue like this?

BTW, is there a quick way to disable the mesh intersection part and still get the fitting script working?I (I read in README that the mesh-intersection package is optional) And if I do that, will the estimation results degrade dramatically?

Also @mikeburon, did you finally managed to get things work? What do you think was the key factor to avoid the illegal memory access error? Thanks!

Lotayou avatar Aug 12 '19 22:08 Lotayou

@Lotayou you can set interpenetration=False to disable the self intersection term. The results should be more or less the same.

vchoutas avatar Aug 13 '19 10:08 vchoutas

The crash with

Error inside sort: radix_sort: failed on 2nd step: invalid argument
Cuda failure src/bvh_cuda_op.cu:877: 'an illegal memory access was encountered'

may also in some cases be fixed by specifying the architecture in the compile args, see https://github.com/vchoutas/torch-mesh-isect/issues/6#issuecomment-518206179

w-m avatar Sep 20 '19 17:09 w-m

Updated version of the set of instructions (initially shared here) to install the required packages:

  • install python 3.7
sudo apt install python3.7
sudo apt install python3.7-venv
sudo apt install python3.7-dev
  • install virtual env
venv_dir=~/venvs/smplify-x
/usr/bin/python3.7 -m venv $venv_dir
source ~/venvs/smplify-x/bin/activate
  • install required packagess
pip install -U pip
pip install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
pip install smplx
pip install git+https://github.com/MPI-IS/configer.git
wget https://github.com/nghorbani/human_body_prior/archive/refs/heads/cvpr19.zip
pip install cvpr19.zip
pip install tensorflow==1.15.2
rm -fr homogenus
git clone https://github.com/nghorbani/homogenus.git
cd homogenus
python setup.py install
cd ..
rm -fr torch-mesh-isect
git clone https://github.com/vchoutas/torch-mesh-isect
cd torch-mesh-isect/
  • Modify line 59 of setup.py from:
  bvh_include_dirs = torch.utils.cpp_extension.include_paths() + [
    'include',
    osp.expandvars('$CUDA_SAMPLES_INC')]

to:

  bvh_include_dirs = torch.utils.cpp_extension.include_paths() + [
    'include',
    '/usr/local/cuda-10.0/samples/common/inc']
python setup.py install
  • install remaining packages
pip install trimesh
pip install pyrender==0.1.25
pip install pyaml
pip install tqdm
pip install configargparse
pip install shapely
pip install loguru
  • make sure to download models (Download SMPL-X v1.1 (830 MB)) and vposer (Download VPoser v1.0 -CVPR'19 (2.5 MB)) data from the project page, and smplx_parts_segm.pkl from here.

  • now the following command should work without any problems: python smplifyx/main.py --config cfg_files/fit_smplx.yaml --data_folder 'YOUR DATA FOLDER CONTAINING IMAGES AND KEYPOINTS' --output_folder './output_folder' --visualize=True --model_folder 'models' --vposer_ckpt 'vposer_v1_0' --part_segm_fn 'smplx_parts_segm.pkl'

UmarSpa avatar May 24 '21 11:05 UmarSpa

Updated version of the set of instructions (initially shared here) to install the required packages:

  • install python 3.7
sudo apt install python3.7
sudo apt install python3.7-venv
sudo apt install python3.7-dev
  • install virtual env
venv_dir=~/venvs/smplify-x
/usr/bin/python3.7 -m venv $venv_dir
source ~/venvs/smplify-x/bin/activate
  • install required packagess
pip install -U pip
pip install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
pip install smplx
pip install git+https://github.com/MPI-IS/configer.git
wget https://github.com/nghorbani/human_body_prior/archive/refs/heads/cvpr19.zip
pip install cvpr19.zip
pip install tensorflow==1.15.2
rm -fr homogenus
git clone https://github.com/nghorbani/homogenus.git
cd homogenus
python setup.py install
cd ..
rm -fr torch-mesh-isect
git clone https://github.com/vchoutas/torch-mesh-isect
cd torch-mesh-isect/
  • Modify line 59 of setup.py from:
  bvh_include_dirs = torch.utils.cpp_extension.include_paths() + [
    'include',
    osp.expandvars('$CUDA_SAMPLES_INC')]

to:

  bvh_include_dirs = torch.utils.cpp_extension.include_paths() + [
    'include',
    '/usr/local/cuda-10.0/samples/common/inc']
python setup.py install
  • install remaining packages
pip install trimesh
pip install pyrender==0.1.25
pip install pyaml
pip install tqdm
pip install configargparse
pip install shapely
pip install loguru
  • make sure to download models (Download SMPL-X v1.1 (830 MB)) and vposer (Download VPoser v1.0 -CVPR'19 (2.5 MB)) data from the project page, and smplx_parts_segm.pkl from here.
  • now the following command should work without any problems: python smplifyx/main.py --config cfg_files/fit_smplx.yaml --data_folder 'YOUR DATA FOLDER CONTAINING IMAGES AND KEYPOINTS' --output_folder './output_folder' --visualize=True --model_folder 'models' --vposer_ckpt 'vposer_v1_0' --part_segm_fn 'smplx_parts_segm.pkl'

What is the alternative path for windows for CUDA samples? SMPL-X current version is supports CUDA 11.1 ?

korenleven avatar Oct 12 '21 08:10 korenleven