DeepShift icon indicating copy to clipboard operation
DeepShift copied to clipboard

shift_kernel & shift_cuda_kernel compiled but can not import

Open Grant-Tao opened this issue 5 years ago • 8 comments

Successfully setup everything, and compiled shift_kernel, but when import shift_kernel, error message appeared:

ImportError: /home/grant/venv/lib/python3.6/site-packages/shift_kernel-0.0.0-py3.6-linux-x86_64.egg/shift_kernel.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_4E

For shift_cuda_kernel, the error message is: Segmentation fault (core dumped)

I am working on Ubuntu 18.04, others are as required.

Grant-Tao avatar Feb 01 '20 14:02 Grant-Tao

Thanks @Grant-Tao for trying out our repo. In order to help me reproduce the error, can you please provide the command that you ran to get each of the 2 errors?

Thanks

mostafaelhoushi avatar Feb 01 '20 17:02 mostafaelhoushi

Big thanks for quick answer. I used : "python setup.py install" . for cpu_kernel and cuda_kernel compilation and install, except for some minor warnings (cpu_kernel no warning, just successful)

my gcc is of version 7.4, all others are your requirements.

the errors appeared when i use "import shift_kernel" and "import shift_cuda_kernel"

btw, you did not mention anything about batchnorm operation in your paper, how do you deal with batchnorm in your shift network. maybe i am wrong

Grant-Tao avatar Feb 02 '20 13:02 Grant-Tao

Hello @Grant-Tao Sorry for the delay in my response. I am just a bit overwhelmed with some other deadlines. I will try to work on solving this problem next week. Also, we're in the middle of some refactoring of the code that includes that updates of the CUDA kernels. So we might as well update the code and hopefully this bug will be solved by it.

Regarding batchnorm, we have left it intact and simply used PyTorch's batchnorm op as is. We may look into the future on implementing batchnorm with bitwise shifts. However, if you're interested, there is more than one paper or GitHub repo that implemented batchnorm using bitwise shift:

  • Check Algorithm 2 and Algorithm 3 in this paper: https://arxiv.org/pdf/1602.02830.pdf
  • (I tried to check for a GitHub repo that implements this paper but I can't find the batchnorm part in it: https://github.com/itayhubara/BinaryNet.pytorch)

mostafaelhoushi avatar Feb 06 '20 21:02 mostafaelhoushi

Sorry for the delay. We have made a big refactoring of the code. Can you checkout the master branch and try again?

mostafaelhoushi avatar Mar 03 '20 14:03 mostafaelhoushi

I forgot to say, you will need to run sh install_kernels.sh to install the CUDA and CPU kernels.

mostafaelhoushi avatar Mar 04 '20 07:03 mostafaelhoushi

Hi, I have a related question about the cuda\cpp version of shift kernel. Since you mentioned that the cuda kernel is only for testing. I wonder what's the difference between cuda implementation and common conv implementation with suitable round as in your code. I don't know if I made this clear. It seems the conv implementation of Shift in your code <modules.py> and <modules_q.py> is exactly the same as Shift function. I wonder what is the effort to implement it using CUDA kernel?

BTW, why is the CUDA kernel implementation not available for training stage? Thanks.

msxiaojin avatar Jun 15 '21 02:06 msxiaojin

Hi @msxiaojin

  • the objective of the cuda/cpp version is to attempt to implement convolution that actually uses bitwise shift rather than multiplication. On the other hand, the convolution with round invokes the regular convolution that uses multiplication (it is like a proof-of-concept to show us what the accuracy will be if we use bitwise shift).
  • to see where we use the shift kernels, you need to follow the use_kernel boolean in module.py and module_q.py
  • the CUDA kernel implementation currently only supports the forward pass. In order to support training, we need to implement it for the backward pass as well.

Please don't hesitate if any of the points I made were not clear.

mostafaelhoushi avatar Jun 15 '21 02:06 mostafaelhoushi

Thanks for the reply. You've made this very clear to me. And thanks for the brilliant work!

msxiaojin avatar Jun 16 '21 16:06 msxiaojin