TNN ubuntu, x86, c++， tnnproto格式的模型在执行推理时 crash 在 softmax层处, 但是用python执行onnx格式的模型没问题。

1. 环境（environment）

Build OS and Version: Ubuntu
RunTime OS Version: Linux
RunTime DEVICE: x86

2. Github版本

branch：v3.0
commit(optional):

3. 编译方式(compile method) build_linux_native.脚本编译的，debug模式

4. 编译日志(build log)

(base) zwz@z-pc:~/Downloads/TNN-0.3.0/scripts$ ./build_linux_native.sh 
mkdir: 无法创建目录"build_linux_native": 文件已存在
/home/zwz/Downloads/TNN-0.3.0
/home/zwz/Downloads/TNN-0.3.0
-- >>>>>>>>>>>>>
-- TNN BUILD INFO:
-- 	System: Linux
-- 	Processor: 
-- 	Cpu:	ON
-- 	X86:	ON
-- 	Arm:	OFF
-- 	Arm82:	OFF
-- 	Metal:	OFF
-- 	OpenCL:	OFF
-- 	CUDA:	OFF
-- 	DSP:	OFF
-- 	Atlas:	OFF
-- 	TensorRT:	OFF
-- 	HuaweiNPU:	OFF
-- 	RKNPU:	OFF
-- 	OpenVINO:	OFF
-- 	OpenMP:	ON
-- 	TEST:	ON
-- 	--Unit Test:	OFF
-- 	Qantization:	OFF
-- 	ModelCheck:	OFF
-- 	DEBUG:	
-- 	PROFILE:	OFF
-- 	BENCHMARK:	ON
-- 	BENCHMARK Layer:	OFF
-- 	Model Converter:	OFF
-- 	ONNX2TNN Converter:	OFF
-- 	TNN2MEM:	OFF
-- 	BENCHMARK Test Lib:	OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /home/zwz/Downloads/TNN-0.3.0/scripts/build_linux_native

5. 详细描述bug 情况 (Describe the bug) tnnproto格式的模型在执行推理时 crash 在 softmax层处, 但是用python执行conver2tnn得到的onnx格式的模型没问题。

6. 运行日志（runtime log）

/home/zwz/Desktop/zwz-toolbox/19_tnn/cmake-build-debug/19_tnn
image_0
descriptor:0

进程已结束,退出代码139 (interrupted by signal 11: SIGSEGV)

7. 截图（Screenshots）

softmax

softmax2

8. c++代码

#include <iostream>
#include <stdio.h>
#include <algorithm>
#include <vector>
#include <opencv2/core/core.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui/highgui.hpp>
#include "tnn/core/macro.h"
#include "tnn/core/tnn.h"
#include "tnn/utils/blob_converter.h"
#include "tnn/utils/mat_utils.h"
#include "tnn/utils/dims_vector_utils.h"
#include "utils.h"
using namespace TNN_NS;
using namespace std;
int main() {


  cv::Mat image=cv::imread("/home/zwz/place_recognition/data/lenovo_data_simple/query/09/599160053306.png", 1);
  //cv::cvtColor(image, image, cv::COLOR_BGR2GRAY);
  std::string protoContent, modelContent;
  protoContent = fdLoadFile("/home/zwz/Downloads/TNN-0.3.0/tools/convert2tnn/freeze_modify.opt.tnnproto");  // see yolov5
  modelContent = fdLoadFile("/home/zwz/Downloads/TNN-0.3.0/tools/convert2tnn/freeze_modify.opt.tnnmodel");  // see yolov5

  TNN_NS::Status status;
  TNN_NS::ModelConfig model_config;
  model_config.model_type = TNN_NS::MODEL_TYPE_TNN;
  model_config.params = {protoContent, modelContent};
  auto net = std::make_shared<TNN_NS::TNN>();
  status = net->Init(model_config);

  TNN_NS::NetworkConfig network_config;
  network_config.device_type = TNN_NS::DEVICE_X86;

  TNN_NS::Status error;
  auto net_instance = net->CreateInst(network_config, error);

  if (status != TNN_NS::TNN_OK || !error) {
    cout<<"initialization failed"<<endl;
  }

  void *command_queue = nullptr;
  auto status_q = net_instance->GetCommandQueue(&command_queue);
  if (status_q != TNN_NS::TNN_OK) {
    cout<<"MatUtils::GetCommandQueue Error: %s"<<status.description().c_str()<<endl;
  }

  // 原始图片
  int image_h = image.rows;
  int image_w = image.cols;

  float* input_image_{nullptr};
  input_image_ = new float[image_h*image_w*3];
  // opencv图像为hwc，tnn为nchw，转换为chw
  {
    int i, j;
    uchar *uc_pixel;
    uc_pixel = image.data;
    for (i = 0; i < image.rows; i++, uc_pixel+=image.step)
    {
      for (j = 0; j < image.cols; j++)
      {
        for(int c = 0; c < 3; c++){
          //input_image_[c*row*col + i*col + j] = uc_pixel[j*3+c];
          input_image_[i*image_w*3 + j*3 + c] = (uc_pixel[j*3+c]-128)/128.0;
        }
      }
    }
  }

  TNN_NS::DeviceType dt = TNN_NS::DEVICE_X86;  // 当前数据来源始终位于CPU，不需要设置成OPENCL，tnn自动复制cpu->gpu
  TNN_NS::DimsVector image_dims = {1, 3, image_h, image_w};
  auto input_mat = std::make_shared<TNN_NS::Mat>(dt, TNN_NS::NCHW_FLOAT, image_dims, input_image_);  // imageSource(RGBA) or dst.data(BGR)
  TNN_NS::MatConvertParam input_cvt_param;
  auto net_instance_status = net_instance->SetInputMat(input_mat,input_cvt_param, "image_0");
  std::vector<std::string> input_names;
  if (net_instance) {
    BlobMap blob_map;
    net_instance->GetAllInputBlobs(blob_map);
    for (const auto& item : blob_map) {
      input_names.push_back(item.first);
    }
  }
  for(int i=0;i<input_names.size();i++){
    cout<<input_names[i]<<endl;
  }

  std::vector<std::string> names;
  if (net_instance) {
    BlobMap blob_map;
    net_instance->GetAllOutputBlobs(blob_map);
    for (const auto& item : blob_map) {
      names.push_back(item.first);
    }
  }
  for(int i=0;i<names.size();i++){
    cout<<names[i]<<endl;
  }
  net_instance->Forward();
  TNN_NS::MatConvertParam clsPparam;
  TNN_NS::DimsVector out_put_dims = {1, 3, image_h, image_w};
  float out_ptr[4096];
  std::shared_ptr<TNN_NS::Mat> cls_pred = std::make_shared<TNN_NS::Mat>(dt,TNN_NS::NCHW_FLOAT, out_put_dims, out_ptr);
  status = net_instance->GetOutputMat(cls_pred, clsPparam, names[0].c_str());
  return 0;
}

Feb 25 '22 06:02 Richard-coder

模型文件链接：https://pan.baidu.com/s/16BYjwo1RDmT0pZIdbeWnaw 提取码：x9ob

Feb 25 '22 06:02 Richard-coder

DEBUG模式在softmax确实是有问题的，换成release模式应该是好的，具体原因还没来得及分析

Feb 25 '22 09:02 seanxcwang

DEBUG模式在softmax确实是有问题的，换成release模式应该是好的，具体原因还没来得及分析

我用release模式重新编译了一遍，发现会crash，参考#824，把opt.onnx格式的模型的reshape的参数都改成了nchw，可以正常运行了，但是，无论输入是什么，输出都相同

Feb 28 '22 09:02 Richard-coder

#1682 在这个pr里应该把问题修复了，如果可能的话可以再试一下

Jun 10 '22 02:06 seanxcwang

TNN TNN copied to clipboard

ubuntu, x86, c++， tnnproto格式的模型在执行推理时 crash 在 softmax层处, 但是用python执行onnx格式的模型没问题。

TNN
TNN copied to clipboard