使用vulkan进行推理时结果不正确
error log | 日志或报错信息 | ログ
context | 编译/运行环境 | バックグラウンド
windows11
how to reproduce | 复现步骤 | 再現方法
使用cpu进行推理时, 结果是正常的, 但是使用gpu推理时,返回的结果是错误的, 我将运行时的所有的blob输出, 结果发现经过第一个卷积层后输出就不一样了
more | 其他 | その他
主要代码如下:
bool useGpu = true;
bool useDebugParam = true;
SetConsoleOutputCP(CP_UTF8);
LOG_I("开始 face detection test...");
// 文件路径配置
std::string param_path;
if (useDebugParam) {
param_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn_debug.param)";
} else {
param_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn.param)";
}
std::string bin_path = R"(D:\tmp\ncnn_pytorch\face_detector.ncnn.bin)";
std::string original_img_path = R"(D:\tmp\image\o\face_image_1080_1920.png)";
std::string padded_image_save_path = R"(D:\tmp\image\face_detector_ncnn_padded.png)"; // 你可以修改为所需路径
std::string output_img_path = R"(D:\tmp\image\face_detector_ncnn.png)";
std::string original_with_detection_output_img_path = R"(D:\tmp\image\face_detector_ncnn_with_original.png)";
// 加载图像
cv::Mat originalImg = cv::imread(original_img_path, cv::IMREAD_UNCHANGED);
if (originalImg.empty()) {
LOG_E("图片未找到: %s", original_img_path.c_str());
return -1;
}
// 转换通道:如果图像有 4 通道,转换为 RGB;否则从 BGR 转换为 RGB
if (originalImg.channels() == 4) {
LOG_D("COLOR_BGRA2RGB");
cv::cvtColor(originalImg, originalImg, cv::COLOR_BGRA2RGB);
} else {
LOG_D("COLOR_BGR2RGB");
cv::cvtColor(originalImg, originalImg, cv::COLOR_BGR2RGB);
}
// 1. letterbox处理后得到 padded 图像,尺寸为 128x128,格式为 RGB
PaddingParams padding_params{};
cv::Mat padded = letterbox_padding(originalImg, cv::Size(128, 128), padding_params);
ncnn::Mat mat_in;
cv::Mat padded_float;
if (useDebugParam) {
mat_in = ncnn::Mat::from_pixels(padded.data, ncnn::Mat::PIXEL_RGB, padded.cols, padded.rows);
const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
mat_in.substract_mean_normalize(0, norm_vals);
mat_in.dims = 4;
}else {
padded.convertTo(padded_float, CV_32FC3, 1.0 / 255.0);
mat_in = ncnn::Mat(3, 128, 128, 1, padded_float.data);
}
print_ncnn_mat_shape(mat_in, "mat_in");
ncnn::Net net;
if (useGpu) {
int gpu_count = ncnn::get_gpu_count();
LOG_D("gpu_count:%d", gpu_count);
if (gpu_count <= 0) {
LOG_E("gpu_count<=0");
return -1;
}
LOG_D("use_vulkan_compute");
net.opt.use_vulkan_compute = true;
// set specified vulkan device before loading param and model
// net.set_vulkan_device(0); // use device-0
net.opt.use_fp16_packed = false;
net.opt.use_fp16_storage = false;
net.opt.use_fp16_arithmetic = false;
net.opt.use_int8_storage = false;
net.opt.use_int8_arithmetic = false;
}
LOG_I("load_param: %s", param_path.c_str());
if (net.load_param(param_path.c_str()) != 0) {
LOG_E("加载 param 文件失败");
return -1;
}
LOG_I("load_model: %s", bin_path.c_str());
if (net.load_model(bin_path.c_str()) != 0) {
LOG_E("加载 bin 文件失败");
return -1;
}
ncnn::Extractor ex = net.create_extractor();
// 设置输入节点名称为 "in0"
LOG_D("ex.input");
ex.input("in0", mat_in);
// 执行推理,提取输出 "out0" 和 "out1"
LOG_D("ex.extract");
ncnn::Mat regressors, scores;
ex.extract("out0", regressors);
ex.extract("out1", scores);
print_ncnn_mat_shape(regressors, "regressors");
print_ncnn_mat_shape(scores, "scores");
int num_regressors = regressors.w * regressors.h * regressors.c; // 896*16
int num_scores = scores.w * scores.h * scores.c; // 896
std::vector<float> reg_vec((float *) regressors.data, (float *) regressors.data + num_regressors);
std::vector<float> score_vec((float *) scores.data, (float *) scores.data + num_scores);
// 对 score_vec 执行 clip(-100,100) 并计算 sigmoid
for (auto &s: score_vec) {
if (s < -100.0f) s = -100.0f;
if (s > 100.0f) s = 100.0f;
s = 1.0f / (1.0f + std::exp(-s));
}
// 找到最大分数索引
int max_index = std::distance(score_vec.begin(), std::max_element(score_vec.begin(), score_vec.end()));
float max_score = score_vec[max_index];
LOG_I("最大分数: %.4f, 索引: %d", max_score, max_index);
通过flag useGpu 切换使用cpu/gpu 推理 bool useGpu = true; 通过flag useDebugParam 切换是否使用手动调整过的param bool useDebugParam = true;
模型是使用pnnx将onnx转换成的ncnn模型, pnnx输出的模型转换输入:
Input in0 0 1 in0
Permute permute_56 1 1 in0 1 0=4
手动调整一下可以传入常规的shape的tensor
Input in0 0 1 in0
Permute permute_56 1 1 in0 1 0=6
区别是 permute 参数 type 修改
现在的现象是: 当 useGpu = false 时, useDebugParam 为 true/false 都可以正常输出 当 useGpu = true 时, useDebugParam 为 true/false 都可以输出, 但是数值是错误的
完整的项目见附件 ncnn-test.zip
输出的blob部分如下, 前2个blob, 使用cpu和gpu时完全一致, 第三个blob开始产生区别 blob.zip
https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result#disable-fp16 尝试禁用fp16测试下
https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result#disable-fp16 尝试禁用fp16测试下
已经尝试过启用和禁用下面的选项:
if (gpu_count > 0) { LOG_D("use_vulkan_compute"); net.opt.use_vulkan_compute = true;
// set specified vulkan device before loading param and model
net.set_vulkan_device(0); // use device-0
net.opt.use_fp16_packed = false;
net.opt.use_fp16_storage = false;
net.opt.use_fp16_arithmetic = false;
net.opt.use_int8_storage = false;
net.opt.use_int8_arithmetic = false;
}
结果是一样的, 我发现 blob "3" 前一小半部分数值是一样的, 从中间开始有区别,我使用对比工具:
右边可以看到前面一部分是相同的:
blob "3" 是图中这个算子的输出:
我把程序从windows平台移植到android平台, 现象与windows平台运行结果一致:
使用 cpu推理结果正确 使用gpu推理可以返回结果, 但是数据是错误的
使用gpu推理时的日志如下:
00:09:41.055 D COLOR_BGRA2RGB 00:09:41.072 D mat_in shape: c=3, d=1, h=128, w=128, dims=4 00:09:41.073 I QUALCOMM build : fdd61e0, I20154638fb Build Date : 10/07/20 Shader Compiler Version : EV031.27.05.01 Local Branch : Remote Branch : refs/tags/AU_LINUX_ANDROID_LA.UM.8.3.R1.10.00.00.520.058 Remote Branch : NONE Reconstruct Branch : NOTHING 00:09:41.073 I Build Config : S P 8.0.11 AArch64 00:09:41.074 W [0 Adreno (TM) 630] queueC=0[3] queueG=0[3] queueT=0[3] 00:09:41.074 W [0 Adreno (TM) 630] bugsbn1=1 bugbilz=0 bugcopc=0 bugihfa=1 00:09:41.074 W [0 Adreno (TM) 630] fp16-p/s/u/a=1/0/0/0 int8-p/s/u/a=1/0/0/0 00:09:41.074 W [0 Adreno (TM) 630] subgroup=64 basic/vote/ballot/shuffle=1/1/0/0 00:09:41.074 W [0 Adreno (TM) 630] fp16-8x8x16/16x8x8/16x8x16/16x16x16=0/0/0/0 00:09:41.074 D gpu_count:1 00:09:41.074 D use_vulkan_compute 00:09:41.074 I load_param: /storage/emulated/0/test/face_detection/face_detector.ncnn_debug.param 00:09:41.079 I load_model: /storage/emulated/0/test/face_detection/face_detector.ncnn.bin 00:09:44.132 D ex.input 00:09:44.132 D ex.extract 00:09:44.286 D regressors shape: c=1, d=1, h=896, w=16, dims=2 00:09:44.286 D scores shape: c=1, d=1, h=896, w=1, dims=2 00:09:44.287 I 最大分数: 0.3218, 索引: 691 00:09:44.290 I 检测结果保存至: /storage/emulated/0/test/output/face_detector_ncnn.png 00:09:44.465 I 原始图像检测结果保存至: /storage/emulated/0/test/output/face_detector_ncnn_with_original.png
初始化net的代码如下:
ncnn::Net net;
if (useGpu) {
int gpu_count = ncnn::get_gpu_count();
LOG_D("gpu_count:%d", gpu_count);
if (gpu_count <= 0) {
LOG_E("gpu_count<=0");
return;
}
LOG_D("use_vulkan_compute");
net.opt.use_vulkan_compute = true;
// set specified vulkan device before loading param and model
// net.set_vulkan_device(0); // use device-0
net.opt.use_fp16_packed = false;
net.opt.use_fp16_storage = false;
net.opt.use_fp16_arithmetic = false;
net.opt.use_int8_storage = false;
net.opt.use_int8_arithmetic = false;
}
I am experiencing the same issue. CPU based inference is correct but vulkan is invalid output on all platforms: Mac, amd, and NVIDIA. I believe one of the ncnn vulkan ops implemented has a bug. My rough translation above seems to indicate it’s the convolution2d.
@XingRay
是输入数据构造的问题,你的代码构造了个4d,实际应该构造3d,就ok了
// mat_in = ncnn::Mat(3, 128, 128, 1, padded_float.data);
mat_in = ncnn::Mat(3, 128, 128, padded_float.data);
I am experiencing the same issue. CPU based inference is correct but vulkan is invalid output on all platforms: Mac, amd, and NVIDIA. I believe one of the ncnn vulkan ops implemented has a bug. My rough translation above seems to indicate it’s the convolution2d.
https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result
If you still have problems, please raise an issue and attach your model file and input.
I am experiencing the same issue. CPU based inference is correct but vulkan is invalid output on all platforms: Mac, amd, and NVIDIA. I believe one of the ncnn vulkan ops implemented has a bug. My rough translation above seems to indicate it’s the convolution2d.
https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result
If you still have problems, please raise an issue and attach your model file and input.
Here https://github.com/Tencent/ncnn/issues/5990
是输入数据构造的问题,你的代码构造了个4d,实际应该构造3d,就ok了
// mat_in = ncnn::Mat(3, 128, 128, 1, padded_float.data); mat_in = ncnn::Mat(3, 128, 128, padded_float.data);
下面3种构造 mat 的方式我都尝试了, 结果都是一样的, 在cpu模式下可以正常输出结果, 在启用 vulkan 时结果都是错误的, 3种构造方式输出的错误结果也是一样的
// ncnn::Mat in_mat(3, 128, 128, 1, padded_float.data);
ncnn::Mat in_mat(3, 128, 128, padded_float.data);
// ncnn::Mat in_mat = ncnn::Mat::from_pixels(padded_float.data, ncnn::Mat::PixelType::PIXEL_RGB, 128, 128);
源码 模型 测试数据等见附件 ncnn-test01.zip