AlphaPose
AlphaPose copied to clipboard
inference- loading model stuck (inference pics)
Hi team, thank you for your great pose project!
when I inference on Amazon AWS EC2(ubuntu16), everything was going well. When I inference on Google Colab, everything was also good.
But yesterday, when I tried to inference on CentOS + CUDA 11.1, it has been stuck.
Loading YOLO model.. Loading pose model from pretrained_models/fast_res50_256x192.pth... 0%| | 0/695 [00:00<?, ?it/s]/
My test folder contains only 3 pics, no other file types.
Thank you very much.
Hi, can you try add --sp? Some error info may appear in this mode.
hi Fang, thank you for your comment. Add --sp by the end of my command? I tried so, still not work Thank you
Hi @hsauod , did you try add --vis_fast ? It seems stuck at rendering (sometimes)
这个问题解决了吗? 我也遇到相同的问题
Same problem. It stuck for "--gpu -1" flag as well, FYI.
I tried build with pytorch1.4 cuda10.1 devel image following suggestion in another issue #677 with no luck.
Then I finally figured it out testing it running on jupyterlab. It has something to do with tqdm. tqdm stuck it on shell, but not if I run shell command in jupyterlab. Try remove the tqdm part or rewrite it in demo_inference.py.
@zjj-2015 @Fang-Haoshu @hsauod
update: pip install tqdm==4.60
fix this as well
same problem .
pip install tqdm==4.60 still no luck
same problem .
pip install tqdm==4.60 still no luck
There is actually another bug caused by .ipynb_checkpoint
folder created by jupyterlab that may cause similar stuck problem. So check if there is a .ipynb_checkpoint
folder in your examples/demo directory.
I'll sum up things I did to make it work here:
- I started with a ubuntu 16.04 distribution with cuda 10.1 driver, v100 gpu.
-
apt update && apt install cuda-toolkit-10-1
(do this and 3 if nvcc/cublas_v2.h/cublas_api.h related problem occurred when you build) -
ln -s /usr/local/cuda-10.2/targets/x86_64-linux/include/cublas_v2.h /usr/local/cuda-10.1/targets/x86_64-linux/include/cublas_v2.h && ln -s /usr/local/cuda-10.2/targets/x86_64-linux/include/cublas_api.h /usr/local/cuda-10.1/targets/x86_64-linux/include/cublas_api.h
- install gcc-7
- install torch and torchvision, check compatibility here.
-
python setup.py build develop --user
- update tqdm with
pip install tqdm==4.60
- check if there is unwanted invisible folder in your examples/demo folder, delete them if found (I'll maybe open a merge request for this later)
@korin-lf
Thank you. I followed 4-7 and got it working on ubuntu 20.04. Now I am on ubuntu 18.04 with no Nvidia card I followed exactly all 2-8 instructions python36, removed all cuda and installed cuda 10.1 and pytorch==1.7.1 torchvision==0.8.2 cpuonly -c pytorch
removed ext_modules=get_ext_modules(), from setup.py as I am on cpu only machine
Now I am getting :
File "/home/user/gesture_detect/gesture_detect/models/layers/dcn/deform_conv.py", line 11, in
what is the possible way to resolve this? I tried with symlinks but no luck thank you
我用torch.cuda.is_available()发现结果为False,重新安装cuda后解决了这个问题。 I found the output of 'torch.cuda.is_available()' is False, and I fixed this problem by reinstalling cuda.
以下是环境信息: Here's my environment setting: Ubuntu 16 Python==3.6 PyTorch==1.1.0 Cuda==10.0.130