AutoDock-GPU icon indicating copy to clipboard operation
AutoDock-GPU copied to clipboard

invalid device symbol

Open Jianzhong2020 opened this issue 3 years ago • 10 comments

Hello,

I just installed autodock-gpu on a ubuntu 20.04 (two 3080 cards, one CUDA version (11.5)) with "make DEVICE=GPU NUMWI=128" command. "autodock_gpu_128wi" did appear in the bin directory. But when I ran "ADU --ffile input/1stp/derived/1stp_protein.maps.fld -lfile input/1stp/derived/1stp_ligand.pdbqt" (I set an alias for autodock_gpu_128wi), the following error kept poping up:

_AutoDock-GPU version: v1.5-release

Running 1 docking calculation

Cuda device: NVIDIA GeForce RTX 3080 (#1 / 2) Available memory on device: 9772 MB (total: 10014 MB)

CUDA Setup time 0.119027s

Running Job #1 Using heuristics: (capped) number of evaluations set to 1132076 Local-search chosen method is: ADADELTA (ad) SetKernelsGpuData copy to cData failed invalid device symbol autodock_gpu_128wi: ./cuda/kernels.cu:130: void SetKernelsGpuData(GpuData*): Assertion `0' failed._

I'm wondering if this is because I have two cards? And I should compile with extra flags? Any guidance would be appreciated.

Jianzhong2020 avatar Jan 11 '22 08:01 Jianzhong2020

@Jianzhong2020 AD-GPU by default only compiles for compute capabilities 52, 60, 61, and 70 - to compile only for an RTX 3080 (compute capability 86) you could compile with: make DEVICE=GPU NUMWI=128 TARGETS="86" Please make sure to add more compute capabilities if you are planning to run on other GPUs, for example, if you wanted a binary for both an RTX 3080 and a Quadro RTX 8000 (compute capability 75) you would use `TARGETS="75 86".

Wikipedia has a very good list of GPU compute capabilities and their Cuda versions here: https://en.wikipedia.org/wiki/CUDA#GPUs_supported

atillack avatar Jan 11 '22 18:01 atillack

@Jianzhong2020 One more thing - if you have more than one docking job I'd recommend using the --filelist feature and to enable multithreading by compiling with OVERLAP=ON. Then, you could let AD-GPU run on both GPUs automatically using the AD-GPU command line option --filelist <your.lst> -D all ;-)

atillack avatar Jan 11 '22 19:01 atillack

Problem solved with TARGETS="86" and OVERLAP=ON. Many thanks @atillack

Jianzhong2020 avatar Jan 12 '22 00:01 Jianzhong2020

I am also getting this in the cuda image nvidia/cuda:10.1-devel-ubuntu18.04 with CUDA version 10.1 and running with 2 V100 Tesla GPUs. Sorry I only get the hanging issue in #186

$ /opt/AutoDock-GPU/bin/autodock_gpu_64wi --ffile 1hsg.maps.fld --lfile indinavir.pdbqt
AutoDock-GPU version: v1.5.3-22-gf8a00853dd3fddd82d13866d3ba88c9137ebd5c0

Running 1 docking calculation

Cuda device:                              Tesla V100-SXM2-32GB (#1 / 2)
Available memory on device:               32162 MB (total: 32480 MB)

CUDA Setup time 0.191685s

BJWiley233 avatar Apr 20 '22 06:04 BJWiley233

Here is my Dockerfile. I am running on LSF as well.

FROM nvidia/cuda:10.1-devel-ubuntu18.04

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y wget check libssl-dev git build-essential devscripts debhelper fakeroot pkg-config dkms libsubunit0 libsubunit-dev cuda-toolkit-10-1 && apt-get update -y && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends hwloc openssh-client && rm -rf /var/lib/apt/lists/* && apt-get update -y &&  DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends software-properties-common && apt-add-repository ppa:ubuntu-toolchain-r/test -y && apt-get update -y && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends libgomp1 && rm -rf /var/lib/apt/lists/*
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/libnvidia-compute-418_418.87.01-0ubuntu1_amd64.deb
RUN apt-get update && dpkg -i libnvidia-compute-418_418.87.01-0ubuntu1_amd64.deb

ENV GPU_INCLUDE_PATH=/usr/local/cuda-10.1/include
ENV GPU_LIBRARY_PATH=/usr/local/cuda-10.1/lib64
ENV CPU_INCLUDE_PATH=/usr/local/cuda-10.1/include
ENV CPU_LIBRARY_PATH=/usr/local/cuda-10.1/lib64

RUN cd /opt && git clone https://github.com/ccsb-scripps/AutoDock-GPU.git && cd AutoDock-GPU && make DEVICE=GPU NUMWI=128

# clean up
RUN apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
    apt-get autoclean && \
    apt-get autoremove -y && \
    rm -rf /var/lib/{apt,dpkg,cache,log}/

CMD ["/bin/bash"]

BJWiley233 avatar Apr 20 '22 06:04 BJWiley233

So I tried making with make DEVICE=GPU NUMWI=128 TARGETS=70 for the V100s but it still hangs. Is there by any chance a docker image already created that I could test?

BJWiley233 avatar Apr 20 '22 06:04 BJWiley233

Well I tried with nvcr.io/hpc/autodock:2020.06 and nvcr.io/hpc/autodock:2020.06-x86_64 even with 296GB RAM and get a seg fault:

$ /opt/AutoDock-GPU/bin/autodock_gpu_128wi -ffile 1hsg.maps.fld -lfile indinavir.pdbqt
AutoDock-GPU version: 09773678fc7e39677061d765b767f4bae8930fb7-dirty

CUDA Setup time 0.261890s
(Thread 0 is setting up Job 0)
Segmentation fault (core dumped)

Go NVIDIA :(

BJWiley233 avatar Apr 20 '22 06:04 BJWiley233

@BJWiley233 Thank you for reporting.

Sorry I only get the hanging issue in https://github.com/ccsb-scripps/AutoDock-GPU/issues/186

Is what you are observing that the code hangs indefinitely? - or does it eventually terminate (w/ or w/o an error message)?

#186 shows this error output (and subsequently triggered program exit) which occurs when the correct target isn't set: SetKernelsGpuData copy to cData failed invalid device symbol

70 is one of the default targets so this doesn't apply to what's happening on your system (confirmed by your next post...).

atillack avatar Apr 20 '22 15:04 atillack

Yes on my docker image the code hangs indefinitely.

BJWiley233 avatar Apr 20 '22 17:04 BJWiley233

I just checked again with Nvidia's image nvcr.io/hpc/autodock:2020.06-x86_64 and realized my map files got screwed up transferring from person computer to LSF storage so this actually works. My image doesn't seem to work and still hangs.

BJWiley233 avatar Apr 21 '22 03:04 BJWiley233