serve Can't Deploy Torchserve ONNX with GPU

🐛 Describe the bug

Disclaimer

Running Torchserve + ONNX + CPU is fine.
I am aware that there is an open issue regarding a similar situation below, but I am confronting something more.
https://github.com/pytorch/serve/issues/2425

Problem

Can't Deploy Torchserve ONNX with GPU

Error logs

Using the built images from any cuda runtime or base images will have python3 -c "import torch; print(torch.cuda.is_available())" returning False For example ./build_image.sh -bi nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04 -t torchserve_cu116 ./build_image.sh -bi nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04 -t torchserve_cu118 Sidenote: Specifying cuda versions e.g. ./build_image.sh --cv 113 often results in an image not found error, probably due to bashscript not being updated.
torchserve:0.8.*-gpu images results in a Failed to create CUDAExecutionProvider error while python3 -c "import torch; print(torch.cuda.is_available())" returns True

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

torchserve:0.7.1-gpu: The only functioning image
torchserve:0.7.0-gpu or below: Assumes the model is a pytorch module and returns attributeerror: 'InferenceSession' object has no attribute 'eval' ; GPU utilization, however, is fine

Installation instructions

Tried both downloading official torchserve images and building source images based of nvidia's base image.

Model Packaing

ONNX handler: https://gist.github.com/andy971022/19ed36022470f099c08ff28c20422244 Dockerfile: https://gist.github.com/andy971022/d11bf90fa4d3e0da37e8ee6ff9538acc

config.properties

  inference_address=http://0.0.0.0:7080
  management_address=http://0.0.0.0:7081
  metrics_address=http://localhost:7082
  service_envelope=json
  model_store=model-store

Versions

From the notebook environment

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.8.1
torch-model-archiver==0.8.1

Python version: 3.7 (64-bit runtime)
Python executable: /opt/conda/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.21.6
nvgpu==0.10.0
open-clip-torch==2.20.0
pillow-avif-plugin==1.3.1
psutil==5.9.3
requests==2.31.0
requests-oauthlib==1.3.1
sentencepiece==0.1.99
torch==1.13.1
torch-model-archiver==0.8.1
torch-workflow-archiver==0.2.9
torchserve==0.8.1
torchvision==0.14.1
transformers==4.30.0
types-requests==2.30.0.0
wheel==0.40.0
torch==1.13.1
**Warning: torchtext not present ..
torchvision==0.14.1
**Warning: torchaudio not present ..

Java Version:


OS: Debian GNU/Linux 10 (buster)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: N/A
CMake version: version 3.13.4

Is CUDA available: Yes
CUDA runtime version: 11.3.109
GPU models and configuration: 
GPU 0: Tesla T4
GPU 1: Tesla T4
Nvidia driver version: 510.47.03
cuDNN version: None

Repro instructions

Have onnx-handler and any onnx e.g. visual.onnx in dir-containing-onnx-assets/
Download the docker from gist
docker build -t ts-test .
docker run --gpus all -p 7080:7080 ts-test

Possible Solution

No response

Aug 31 '23 02:08 andy971022

Yeah we are prioritizing a larger dev image that would have all these dependencies @agunapal

Aug 31 '23 02:08 msaroufim

Hey, the image still does not have cudnn support @msaroufim

Jun 24 '24 22:06 Cognitus-Stuti