lite.ai.toolkit GPU推理，怎么能支持cuda10.2? onnxruntime的库，没有支持的版本，自己编译了，但是仍然不支持。

Nov 07 '21 14:11 xinsuinizhuan

https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1
可以看下onnxruntime 1.8.1 官方release，1.8.1后支持CUDA 10.2 和 cudnn 11

Nov 07 '21 15:11 DefTruth

这个我试过，就同样的代码，库切换到1.8.1，推理速度很慢，感觉就是在CPU上跑，不是gpu

Nov 08 '21 02:11 xinsuinizhuan

所以，不知道是我们代码里面，配置的问题，还是库的问题

Nov 08 '21 02:11 xinsuinizhuan

所以，不知道是我们代码里面，配置的问题，还是库的问题

emmm，不太清楚哎，毕竟我这里的代码和onnxruntime没法比。以下是我的一些建议：
有可能是CUDAExecuteProvider对应的设备号不一定是0，我这里写死了是0. 在 lite/ort/core/ort_handler.cpp 中：

  // GPU compatiable.
  // OrtCUDAProviderOptions provider_options;
  // session_options.AppendExecutionProvider_CUDA(provider_options);
#ifdef USE_CUDA
  OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0); // C API stable.
#endif
  // 1. session
  ort_session = new Ort::Session(ort_env, onnx_path, session_options);

另外，需要手动定义宏USE_CUDA，才会启用上面这段代码。在lite/ort/core/ort_config.h 中

#ifndef LITE_AI_ORT_CORE_ORT_CONFIG_H
#define LITE_AI_ORT_CORE_ORT_CONFIG_H

#include "ort_defs.h"
#include "lite/lite.ai.headers.h"

#ifdef ENABLE_ONNXRUNTIME
#include "onnxruntime/core/session/onnxruntime_cxx_api.h"
/* Need to define USE_CUDA macro manually by users who want to
 * enable onnxruntime and lite.ai.toolkit with CUDA support. It
 * seems that the latest onnxruntime will no longer pre-defined the
 * USE_CUDA macro and just let the decision make by users
 * who really know the environments of running device.*/
// #define USE_CUDA
#  ifdef USE_CUDA
#include "onnxruntime/core/providers/cuda/cuda_provider_factory.h"
#  endif
#endif

namespace core {}

#endif //LITE_AI_ORT_CORE_ORT_CONFIG_H

还可以参考onnxruntime的官方案例，检查一下OrtSessionOptionsAppendExecutionProvider_CUDA返回的status是否正常。 fns_candy_style_transfer.c

int enable_cuda(OrtSessionOptions* session_options) {
  // OrtCUDAProviderOptions is a C struct. C programming language doesn't have constructors/destructors.
  OrtCUDAProviderOptions o;
  // Here we use memset to initialize every field of the above data struct to zero.
  memset(&o, 0, sizeof(o));
  // But is zero a valid value for every variable? Not quite. It is not guaranteed. In the other words: does every enum
  // type contain zero? The following line can be omitted because EXHAUSTIVE is mapped to zero in onnxruntime_c_api.h.
  o.cudnn_conv_algo_search = EXHAUSTIVE;
  o.gpu_mem_limit = SIZE_MAX;
  OrtStatus* onnx_status = g_ort->SessionOptionsAppendExecutionProvider_CUDA(session_options, &o);
  if (onnx_status != NULL) {
    const char* msg = g_ort->GetErrorMessage(onnx_status);
    fprintf(stderr, "%s\n", msg);
    g_ort->ReleaseStatus(onnx_status);
    return -1;
  }
  return 0;
}

希望对你有用~ 补充：不太确定我用的1.7的头文件是否适配1.8.1，也可以试一下更换1.8.1的头文件，需要你自己需改合适的头文件路径。因为官方pre-built的提供头文件比较少。我这边是从源码自己编译的，包含了所有的头文件，并且头文件目录结构和官方提供的不太一样。

Nov 08 '21 13:11 DefTruth

您这边自己编译的onnxruntime，速度咋样？能达到30ms吗，yolov5?

Nov 11 '21 06:11 xinsuinizhuan

您这边自己编译的onnxruntime，速度咋样？能达到30ms吗，yolov5?

我暂时在mac玩，没有用到GPU，yolov5s的速度还可以，没有具体去测。等之后MNN/TNN/NCNN各个版本的整合完善后，会添加在不同平台和性能测试。
关于 lite_yolov5 的测试方式，因为用的是 yolov5s.onnx ，本身是小模型，在CPU是可以跑挺快的，如果切换成GPU，不能只跑我写的demo，我写的只是单次推理。GPU的首次推理通常比较慢，之后会变快。你需要参考test_lite_yolov5.cpp的例子，在 new 完 lite::cv::detection::YoloV5 后，跑个循环测一下，忽略首次推理。

Nov 12 '21 01:11 DefTruth

嗯嗯。就是很奇怪的是，我自己编译的gpu版本的onnxruntime，推理速度就是cpu的速度下载的git上onnxruntime的源码，然后使用下面的命名编译，推理的时候，还是cpu速度： .\build.bat --build_shared_lib --config Release --use_cuda --cuda_version=11.0 --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --use_tensorrt --tensorrt_home "D:\vison_software\NVIDIA\cuda11.0\TensorRT-8.2.0.6"

Nov 12 '21 09:11 xinsuinizhuan

嗯嗯。就是很奇怪的是，我自己编译的gpu版本的onnxruntime，推理速度就是cpu的速度下载的git上onnxruntime的源码，然后使用下面的命名编译，推理的时候，还是cpu速度： .\build.bat --build_shared_lib --config Release --use_cuda --cuda_version=11.0 --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --use_tensorrt --tensorrt_home "D:\vison_software\NVIDIA\cuda11.0\TensorRT-8.2.0.6"

这就不太清楚了，可以尝试下把GPU设置的status打log，看看是否正常设置了GPU

int enable_cuda(OrtSessionOptions* session_options) {
  // OrtCUDAProviderOptions is a C struct. C programming language doesn't have constructors/destructors.
  OrtCUDAProviderOptions o;
  // Here we use memset to initialize every field of the above data struct to zero.
  memset(&o, 0, sizeof(o));
  // But is zero a valid value for every variable? Not quite. It is not guaranteed. In the other words: does every enum
  // type contain zero? The following line can be omitted because EXHAUSTIVE is mapped to zero in onnxruntime_c_api.h.
  o.cudnn_conv_algo_search = EXHAUSTIVE;
  o.gpu_mem_limit = SIZE_MAX;
  OrtStatus* onnx_status = g_ort->SessionOptionsAppendExecutionProvider_CUDA(session_options, &o);
  if (onnx_status != NULL) {
    const char* msg = g_ort->GetErrorMessage(onnx_status);
    fprintf(stderr, "%s\n", msg);
    g_ort->ReleaseStatus(onnx_status);
    return -1;
  }
  return 0;
}

Nov 12 '21 14:11 DefTruth

你好，请问你的GPU环境配置好了嘛，T ^ T刚刚配置GPU环境生成老是报错，但是依赖环境都已经包含了

Mar 25 '22 14:03 hanxizai

开启USE_CUDA编译选项，同时在onnxruntime/core/session/onnxruntime_c_api.h最后添加ORT_API_STATUS(OrtSessionOptionsAppendExecutionProvider_CUDA, In OrtSessionOptions* options, int device_id);

May 11 '23 03:05 moment0517

lite.ai.toolkit lite.ai.toolkit copied to clipboard

GPU推理，怎么能支持cuda10.2? onnxruntime的库，没有支持的版本，自己编译了，但是仍然不支持。

lite.ai.toolkit
lite.ai.toolkit copied to clipboard