PyTorch-Encoding icon indicating copy to clipboard operation
PyTorch-Encoding copied to clipboard

Rebuild every time while importing

Open CorcovadoMing opened this issue 4 years ago • 5 comments

Hi,

I got lib folder re-compiling every time when executing import encoding How can I do to have a one-time installation without re-compiling all the CUDA codes?

CorcovadoMing avatar Apr 19 '20 14:04 CorcovadoMing

They are cached on my machine. Importing is fast at the second time.

I don't have this issue. Are you modifying the .so, .o, .ninja files by any chance? for example rsync or scp from another machine, which overwrites the cached files?

zhanghang1989 avatar Apr 19 '20 15:04 zhanghang1989

No, I've pulled the clean repo again, and it didn't cache for me.

I do the following steps to install

git clone https://github.com/zhanghang1989/PyTorch-Encoding.git
cd PyTorch-Encoding
python setup.py install

I've also tried not to python setup.py install but directly import encoding in the pulled repo, still no cache

I saw the two directories are generated after importing in the repo, dist and torch_encoding.egg-info. But seems not related to the compilation

I do those steps in Docker with

  • Ubuntu 18.04
  • CUDA 10.2
  • Python 3.6.8

CorcovadoMing avatar Apr 19 '20 16:04 CorcovadoMing

The .so, .o, .ninja should be generated at the first time when you import encoding. If not, may be the python path need administrator access. I use anaconda which does not require admin access.

Could you try using python setup.py develop? This will generate a link to the source folder.

zhanghang1989 avatar Apr 19 '20 17:04 zhanghang1989

I found it did generate the caches, but it still compile repeatedly

I've enabled the verbose=True in __init__.py under lib to see what's going on:

>>> import encoding
Detected CUDA files, patching ldflags
Emitting ninja build file /root/PyTorch-Encoding/encoding/lib/gpu/build.ninja...
Building extension module enclib_gpu...
[1/7] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.6/site-packages/torch/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' --expt-extended-lambda -std=c++14 -c /root/PyTorch-Encoding/encoding/lib/gpu/roi_align_kernel.cu -o roi_align_kernel.cuda.o
...
[7/7] c++ operator.o activation_kernel.cuda.o encoding_kernel.cuda.o syncbn_kernel.cuda.o roi_align_kernel.cuda.o nms_kernel.cuda.o rectify_cuda.cuda.o -shared -L/opt/conda/lib/python3.6/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o enclib_gpu.so
Loading extension module enclib_gpu...

It ends up compile successfully and the compilation results are generated:

root@1132724:~/PyTorch-Encoding# ls encoding/lib/gpu/
activation_kernel.cu      common.h         encoding_kernel.cu      nms_kernel.cuda.o  operator.o           roi_align_kernel.cu      syncbn_kernel.cu
activation_kernel.cuda.o  device_tensor.h  encoding_kernel.cuda.o  operator.cpp       rectify_cuda.cu      roi_align_kernel.cuda.o  syncbn_kernel.cuda.o
build.ninja               enclib_gpu.so    nms_kernel.cu           operator.h         rectify_cuda.cuda.o  setup.py
root@1132724:~/PyTorch-Encoding# 

However, when I re-enter the python interpreter and import encoding, it starts to re-compile again!

Have you come up any idea why this happened?

CorcovadoMing avatar Apr 19 '20 17:04 CorcovadoMing

Sorry, I don't have that issue on my machine. Looks like pytorch issue. Maybe you can try using PyTorch 1.4.0, which I am using.

zhanghang1989 avatar Apr 19 '20 18:04 zhanghang1989