DAU-ConvNet
DAU-ConvNet copied to clipboard
Using DAU on Res50
Hello Author @skokec ,
May I check with you if it's possible to use this implementations on a Windows platforms as I see the readme has only the ones tested on Ubuntu. Could you please let us know your suggestions as I am trying to use this on a Res-50 but running it on Windows.
Unfortunately I have never used deep learning on windows so I cannot give any input how to make it work there. It might run out-of-the-box or it might not.
I would probably suggest looking into using Windows Subsystem for Linux which should allow you to run linux programs on windows including deep learning framworks.
Thanks for your prompt reply. I also see options to use the docker image in the repository readme, is it also designated to use only on Linux platforms using NVIDIA GPUs? @skokec
Docker images are build on top on linux as well, using nvidia docker image with ubuntu 16.04 by default. If you run docker on WSL 2 (Windows Subsystem for Linux) you should be able to get access to GPUs within linux docker image running on windows so this could be an option.
Thank you very much @skokec !
Hello author @skokec ,
So if it's a WSL or Ubuntu OS based setup, the necessary .so files are obtained by following the build & installation from scratch approach is it? If it's on based on pre-compiled binaries or docker, how does this linking happens to the code.
Thank you in advance!
@akshu281, in WSL2 it should be possible to just install pip packages with pre-compiled binaries. I think WLS2 uses virtualization in the backend therefore any binary for linux will work in WSL2. The provided whl file already contains .so files and they are copied to appropriate place in python dist-packages/side-packages folders when installed with pip. Therefore .so will be automatically found by python and loaded when python class for DAU is used (this is something that python handles internally as long as .so files are in correct paths).
Note, than when manually building with make, it will also creates whl files for pip in the background and it therefore installs .so files in python folders in the same way as when manually installing pre-compiled whl files.
Hello, Thank you for the quick reply. I managed to use Ubuntu OS right now, so I shall just create a virutalenv with Python 3. 5 (installed default) and use the below commands to directly use it isn't it?
Yes, in Ubuntu just install openblas library and then you can install the precompiled whl with pip.
Before that, we should install all the dependencies is it? Also when we run the unit test, what's the expected output if setup is done correctly to verify
Before installing DAUs with whl file in pip you need to have at least TensorFlow installed, otherwise pip will automatically download and install it for you. Other dependencies are needed only when running the main program, which need to be installed manually. You can install them before or after installing DAUs with whl file in pip, the order is irelevant.
Unit tests will automatically let you know if there are any issues. It internally computes forward and backward passes for CPU and GPU implementation and compares them so that they are consistent. If there are any deviations, the unit test will return errors.
I get the above error when I tried to use the test using the whl, I have installed the other dependencies according to Python 3.5 on Ubuntu 16.04 (had to downgrade setuptools to 50.0.0 and numpy to 1.18.0)
Here's my systeminfo
Please let me know what could be the possible error here. Also one quick question, is it possible to use a higher version like Python 3.7 for instance and use the compiled whls or we have to rebuild it from scratch as the ones tested are on 3.5
Appreciated your quick responses.
It seams the installed .so in whl are not appropriate for your system. I am not sure where the problem is but you will need to compile it manually. Try following the install script from docker file.
Unfortunately, I only prebuild files for python 3.5 as this was default python for ubuntu 16.04 when I was developing this plugin. I will try updating my build script for 3.7 and newer version, and will try building it for newer TF version, but cannot promise if this will work.
I will re-check on the build steps using the instructions outlined. May I check if the docker is an option (using nvidia-docker 2) if I have to use this implementations on other versions of Ubuntu, Python and CUDA as the same method is also extended is used as mentioned in this repo: https://github.com/zhiqic/Rethinking-Counting but the training framework is not using Python 3.5. So does it mean the .so files are re-compiled on their end and not the same pre-compiled so files with Python 3.5 & CUDA 10.0 has been used.
Thank you for the response in advance.
You can use docker images that are already prebuild, but then you need to run everything inside docker which will require you to use python3.5.
I am not sure how authors of rethinking-counting used our code, but if they have newer python version then they must have re-compiled all .so files for that python version and cuda version (and TF version).
I am currently in the process of trying to upgrade the build scripts for python3.7 and TF1.15, but some things have changed in newer cuda versions and ubuntu so it does not work straights out of the box. Will let you know if I mange to build it.
@akshu281, I was able to build new whl for python3.7 and python3.8. You can find them in https://box.vicos.si/skokec/dau-convnet/. Binaries for python3.7 were build for TF 1.14 and TF 1.15.5, while for python3.8 they were build for TF 2.2 based on this compatability metrix. I did not run unittest so let me know if there are any issues.
I built those whl files using ubuntu 18.04, numpy 1.19.5, protobuf 3.2 and cmake 3.21. You need to use the new branch TF1.15.5 that has patches for newer TensorFlow versions if you will built it manually. Build process was done using plugins/tensorflow/docker/Dockerfile.ubuntu18.04 from TF1.15.5 branch.
I am not sure if DAUs will work in TensorFlow v2, since this version has been significantly changed from v1. I had to disable dau_conv2d() calls since they are not supported in TF v2 and you will need to use directly DAUConv2d class.
@skokec -- Thank you so much! I shall try with the settings you have mentioned. Really appreciate your quick turnaround, I will let know if there's any issue regarding the same. One quick question while using Python 3.7 & 3.8 the CUDA versions used has to be higher as well right. May I confirm what is the version of CUDA & cuDNN you have used? Is it CUDA 10 and cuDNN 7 as mentioned on the dockerfile
nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
@akshu281, DAU plugin was built for compatibility with TensorFlow installed from pip, therefore I used the required CUDA, cuDNN and Python version as defined in this table on TensorFlow build version. For TF >=1.13.0 and <=2.0.0 you need to use CUDA 10 and cuDNN 7 (nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04) as you specified, but different version of TensorFlow will require different CUDA and cuDNN.
I now also prepared binaries for newer version of TensorFlow (up to version 2.12) using Python 3.8 based on the code in new branch TF2. You can find WHL files in assets under release page. Note, that I only updated the code to make it build for TF v2, but did not thoroughly test how this plugin works under the new API in TensorFlow v2.
For TensorFlow >=2.1.0 you will then need to use different CUDA and cuDNN, use this table on TensorFlow build versions as a reference. You can also see which base docker image I used for building WHL files in build-whl.sh script at line 41.
Understand the settings for different versions of TF and its corresponding CUDA versions. Let's say I have to just use the .so files from the pre-compiled whl using the generated ones for Py37 or Py38, instead of building from scratch. Do I still have to make sure my base system CUDA and everything needs to be matching or it is okay to just install the cudatoolkit10xx/11xx inside the virtualenv/conda env, then use up this .so within that environments directly instead of a docker. Kindly let me know your thoughts. Thank you for your timely responses @skokec!
It should be sufficent to just install cudatoolkit10xx/11xx inside the virtualenv/conda env and TensorFlow should then use this toolkit. At least it works on some my other projects in conda (but using pytorch not tensorflow), so I would expect it should work for TensorFlow and DAUs, but I have not tested it.
It is only important that Python has paths to virtualenv/conda installed cudatoolkit before system installed one so that it finds that version first when loading condatoolkit .so files. When using virtualenv/cuda it would be even be better not to have CUDA installed on system and have it installed ONLY in virtualenv/cuda to avoid any potential issues in finding incorrect CUDA .so files.
Thank you @skokec -- I shall try this way first as mentioned using virutalenv or conda env without using any base CUDA.
Hello author @skokec, I just got an opportunity to circle back to try this again, using the new wheels generated.
As a sidenote I managed to fix it, this one here is due to a missing instruction set for TF on old versions like AVX2 - just have to find and enable the flags however if I use the old versions I face other blockers as many items are deprecated and reached EOL. Some new features are not being supported by previous versions of Python for TF - so I had to discard this approach as the integration with other app becomes a challenge
So I tried to use one of the newly generated wheels that is on the release section -- to be exact this one (dau_conv-1.0_TF2.5.0-cp38-cp38m-manylinux1_x86_64.whl) without installing TF, as the dau_conv package does it for me when I install it but I get into this weird mismatch about the platform again, any thoughts on this newly developed ones?
And here are the other details:
OS: Ubuntu 18.04.6 CUDA: 11.2 Python (VirtualEnv): 3.8.0
Kindly seeking your advice on this for the newly compiled wheels to test it out further. Thanks much always!
Hey akshu281, sorry for the delay!
I think the issue lies in the filename itself. It seems that py38m
naming isn't used anymore in newer Python versions. You can try renaming the file to dau_conv-1.0_TF2.5.0-cp38-cp38-manylinux1_x86_64.whl
, and it should install without any issues.
I encountered the same error on my setup initially, but after renaming the file, I was able to install it successfully. I've already updated the filenames in the release links.
Hello @skokec - thank you so much, I will give this a try! I was experimenting earlier with the old versions using docker.