djl
djl copied to clipboard
DJL 0.23+tensorflow cu113 can not find cuda capabilities
https://github.com/deepjavalibrary/djl/issues/2573
I have similar problem,my environment is as follows: linux ,gpu t4 CUDA: 113 ARCH: 75 DJL version: 0.23.0 ai.djl.util.Platform - Found placeholder platform from: cu113-linux-x86_64:2.10.1 Default Engine: TensorFlow:2.10.1, capabilities: [MKL,] TensorFlow Library: /usr/local/app/.djl.ai/tensorflow/2.10.1-cu113-linux-x86_64/libjnitensorflow.so
engine.hasCapability(StandardCapabilities.CUDA) is always false CudaUtils.getGpuCount() is1
I checked the composition logic of the code in this url, FLAVOR is already cu113: Downloading https://publish.djl.ai/tensorflow-2.10.1/linux/cu113/THIRD_PARTY_TF_JNI_LICENSES.gz
Please help me look into this issue @frankfliu
@codeMan2018
Can test it in docker image:
git clone djl
docker run -it --rm --network=host -v $PWD:/workspace --runtime=nvidia --shm-size=2gb nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 bash
In the docker container:
apt-get update
apt-get install openjdk-11-jdk-headles
cd /workspace/djl
./gradlew debugE -Dai.djl.default_engine=TensorFlow
Please post the output if you are not able see CUDA capability.
You should see something like:
DJL version: 0.24.0-SNAPSHOT
[DEBUG] - Using cache dir: /root/.djl.ai/tensorflow
[INFO ] - Downloading https://publish.djl.ai/tensorflow-2.10.1/linux/cu113/THIRD_PARTY_TF_JNI_LICENSES.gz ...
[INFO ] - Downloading https://publish.djl.ai/tensorflow-2.10.1/linux/cu113/LICENSE.gz ...
[INFO ] - Downloading https://publish.djl.ai/tensorflow-2.10.1/linux/cu113/libjnitensorflow.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/tensorflow-2.10.1/linux/cu113/libtensorflow_framework.so.2.gz ...
[INFO ] - Downloading https://publish.djl.ai/tensorflow-2.10.1/linux/cu113/libtensorflow_cc.so.2.gz ...
[DEBUG] - Loading TensorFlow library from: /root/.djl.ai/tensorflow/2.10.1-cu113-linux-x86_64/libjnitensorflow.so
2023-10-12 00:49:22.073291: I external/org_tensorflow/tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-12 00:49:22.128940: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-12 00:49:22.165453: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-12 00:49:22.166548: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-12 00:49:22.271788: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-12 00:49:22.273339: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-12 00:49:22.858677: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-12 00:49:22.860347: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-12 00:49:22.861829: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-12 00:49:22.863294: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13584 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5
[DEBUG] - Using cache dir: /root/.djl.ai/tensorflow
Default Engine: TensorFlow:2.10.1, capabilities: [
MKL,
CUDA,
]
TensorFlow Library: /root/.djl.ai/tensorflow/2.10.1-cu113-linux-x86_64/libjnitensorflow.so
Default Device: gpu(0)