I convert onnx to ncnn successfully, but all my inference is all nan. Eg, the output of net.extract() is all nan.
error log | 日志或报错信息 | ログ
context | 编译/运行环境 | バックグラウンド
Ubuntu 18.04.6 LTS ncnn-20240410-android-shared android-ndk-r17c
how to reproduce | 复现步骤 | 再現方法
1.Just run ./sc_ncnn img.jpg with adb shell on Android.
2.I try to excute net.extract("x_lr_A", A); in C++, and A is all nan. It depressed me...
pre_process:
1.cv.imread
2.resize(512,512)
3.hwc-->chw, maybe wrong, but shouldn't get nan.
4.normalize [0,1] maybe not right, but shouldn't get nan.
I'm new to ncnn, but curious about the extreme mobile performance of ncnn, so I must go to ncnn right now! I will paste my code below, Th
more | 其他 | その他
中文版: 我在ubuntu上make编译(在别人的帮助下编译通过,我不太懂部署相关的,但对ncnn的极致非常钦佩,所以想试一下)出sc_ncnn放到Android上执行,模型输出2个值name分别为x_lr_A、x_lr_b,但是我通过net.extract()得到的值都是nan,我不知道哪里错了。附件上传了.param和.bin,希望得到帮助,感谢! Help, please.
#include <iostream>
#include <string>
#include <fstream>
#include <ctime>
using namespace std;
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/imgcodecs/imgcodecs.hpp"
#include "opencv2/opencv.hpp"
#include "opencv2/core.hpp"
using namespace cv;
#include <vector>
#include <omp.h>
#include <cmath>
#include "mat.h"
#include "net.h"
int main(int argc, char** argv){
// 加载模型
ncnn::Net model_ncnn;
model_ncnn.load_param("best_sim.param");
model_ncnn.load_model("best_sim.bin");
model_ncnn.opt.lightmode = true;
model_ncnn.opt.num_threads = 1;
// model_ncnn.opt.use_fp16_storage = false;
// model_ncnn.opt.use_fp16_arithmetic = false;
vector<const char*> input_names = model_ncnn.input_names();
vector<const char*> output_names = model_ncnn.output_names();
for (const char* name : input_names){
cout << "input: " << name << endl;
}
for (const char* name : output_names){
cout << "output: " << name << endl;
}
vector<const char*> output_names_ = model_ncnn.output_names();
for (const char* name : output_names_){
cout << "output_: " << name << endl;
}
// 准备输入
string img_path = argv[1];
cv::Mat img = cv::imread(img_path);
// x_hr
int x_hr_size = 2048;
cv::Mat x_hr;
cv::resize(img, x_hr, cv::Size(x_hr_size, x_hr_size));
// x_lr
int x_lr_size = 512;
cv::Mat x_lr;
cv::resize(x_hr, x_lr, cv::Size(x_lr_size, x_lr_size));
// cv::cvtColor(x_lr, x_lr, cv::COLOR_BGR2RGB);
// cv::Mat x_lr_float;
// x_lr.convertTo(x_lr_float, CV_32FC3);
// cout << "x_lr_floag" << x_lr_float << endl; # 即便归一化,from_pixels竟然会映射到0-255
cout << "定义x_lr" << endl;
// x_lr转CHW
// vector<float> x_lr_chw;
// vector<Mat> channels(3);
// split(x_lr, channels);
// for (auto i = 0; i < channels.size(); i++){
// vector<float> data = vector<float>(channels[i].reshape(1, 1));
// x_lr_chw.insert(x_lr_chw.end(), data.begin(), data.end());
// }
// cout << "x_lr转CHW" << endl;
// x_lr_chw转Mat
// cv::Mat x_lr_mat(x_lr_size, x_lr_size, CV_32FC3);
// int index = 0;
// for (int y = 0; y < x_lr_size; ++y) {
// for (int x = 0; x < x_lr_size; ++x) {
// // 对于每个像素,将三个通道的值复制到 cv::Mat 中
// x_lr_mat.at<cv::Vec3f>(y, x) = cv::Vec3f(x_lr_chw[index], x_lr_chw[index + 1], x_lr_chw[index + 2]);
// index += 3; // 更新索引以处理下一个像素
// }
// }
// cout << "x_lr_chw转Mat" << endl;
// 模型输入
ncnn::Mat in = ncnn::Mat::from_pixels(x_lr.data, ncnn::Mat::PIXEL_BGR2RGB, x_lr.cols, x_lr.rows);
int in_min = 256.0;
int in_max = 0.0;
for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
if (in[i] > in_max){
in_max = in[i];
} else if (in[i] < in_min){
in_min = in[i];
}
}
cout << "in 最值: "<< in_min << " " << in_max << endl;
// for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
// in[i] = (in[i] - in_min) / (in_max - in_min);
// }
const float mean_vals[3] = {0.f, 0.f, 0.f};
const float norm_vals[3] = {1.0 / 255.0, 1.0 / 255.0, 1.0 / 255.0};
in.substract_mean_normalize(mean_vals, norm_vals);
// for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
// cout << in[i] << " ";
// }
// 模型推理
ncnn::Mat A;
ncnn::Mat b;
ncnn::Extractor ex = model_ncnn.create_extractor();
ex.input("x_lr", in);
ex.extract("A", A);
ex.extract("b", b);
cout << "推理结束" << endl;
double A_min = 257.0;
double A_max = 0.0;
for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
if (A[i] > A_max){
A_max = A[i];
} else if (A[i] < A_min){
A_min = A[i];
}
// cout << b[i] << " ";
}
cout << "A 最值: "<< A_min << " " << A_max << endl;
// 清理模型
// model_ncnn.clear();
return 0;
}```
What is the result when transfering the model into .param & .bin. Some op not support? I check the output from different output layer and find it prints "NAN" after some middle layers, but I cant locate it. So maybe unsupport op exist, can you upload the original model file (be like .onnx) so I can check the model structure further.
What is the result when transfering the model into .param & .bin. Some op not support? I check the output from different output layer and find it prints "NAN" after some middle layers, but I cant locate it. So maybe unsupport op exist, can you upload the original model file (be like .onnx) so I can check the model structure further.
Thank you for your reply! I have found the reason why the model outputs nan. The original author implemented a custom LayerNorm operation. This operation can be implemented in Pytorch:
class LayerNorm2d_Sc(nn.Module):
""" 作者实现的自定义LayerNorm,理论上Pytorch通过调整维度是能做到的,我也验证了这一点,但是ncnn中暂无法实现 """
def __init__(self, channels, eps=1e-6):
super(LayerNorm2d_Sc, self).__init__()
self.register_parameter('weight', nn.Parameter(torch.ones(channels)))
self.register_parameter('bias', nn.Parameter(torch.zeros(channels)))
self.eps = eps
self.torch_layernorm = torch.nn.LayerNorm(channels, eps=eps, elementwise_affine=False)
def forward(self, x):
# 我尝试使用Pytorch的LayerNorm替换,Pytorch代码和导出的onnx均可以得到正常的结果,但转ncnn失败
# C = x.shape[1]
# x_ = x.clone()
# x_ = x_.permute(0, 2, 3, 1)
# y = self.torch_layernorm(x_)
# y = y.permute(0, 3, 1, 2)
# # y = self.weight.view(1, C, 1, 1) * y + self.bias.view(1, C, 1, 1)
# return y
# 原作者实现的自定义LayerNorm。Pytorch和导出的onnx均可以得到正常结果,但转ncnn后推理得到全黑的图像
C = x.shape[1]
x_ = x.clone()
mu = x_.mean(dim=1, keepdim=True)
var = (x_ - mu).pow(2).mean(dim=1, keepdim=True)
y = (x_ - mu) / (var + self.eps).sqrt()
y = self.weight.view(1, C, 1, 1) * y + self.bias.view(1, C, 1, 1)
return y
I tried using numpy instead of Pytorch. The inference result was not completely black, but it was not normal either. I saw in ncnn's wiki that the implementation layer can be customized, and I am trying to add the author's custom LayerNorm (if I understand correctly, the dimension processed by the ncnn model in C++ is WHC, and the output is also WHC. But in Python, ncnn output seems to be CHW. At least I can get normal results by CHW. Of course, I am more concerned about the results in C++.)
Hello! 1、but in my practice,the dimension processed by the ncnn model in C++ is also CDHW, and the output is also CDHW. See the code in C++ to flatten the output below. It means [Batch,Channel,Height,Width]. So,
void pretty_print(const ncnn::Mat &m, std::vector<float> &vec_heap) {
for (int q = 0; q < m.c; q++) {
const float *ptr = m.channel(q);
for (int z = 0; z < m.d; z++) {
for (int y = 0; y < m.h; y++) {
for (int x = 0; x < m.w; x++) {
vec_heap.emplace_back(ptr[x]);
}
ptr += m.w;
}
}
}
}
2、Your own LayerNorm2d_Sc works the same with the original one. If your own LayerNorm2d_Sc works but fails in transfering to ncnn model. Maybe you can update the ncnn version and compile the layernorm operation (see https://github.com/Tencent/ncnn/issues/5262#issuecomment-1880330462 for detail). Could you post the error message?
And for 转ncnn后推理得到全黑的图像 , maybe u need to re-normalize the output to [0,256] and get the final output.
What is the result when transfering the model into .param & .bin. Some op not support? I check the output from different output layer and find it prints "NAN" after some middle layers, but I cant locate it. So maybe unsupport op exist, can you upload the original model file (be like .onnx) so I can check the model structure further. Here is the onnx from Pytorch w/o onnxsim. model_trace_1.4M_512.onnx.zip
Hello! 1、but in my practice,the dimension processed by the ncnn model in C++ is also CDHW, and the output is also CDHW. See the code in C++ to flatten the output below. It means [Batch,Channel,Height,Width]. So,
void pretty_print(const ncnn::Mat &m, std::vector<float> &vec_heap) { for (int q = 0; q < m.c; q++) { const float *ptr = m.channel(q); for (int z = 0; z < m.d; z++) { for (int y = 0; y < m.h; y++) { for (int x = 0; x < m.w; x++) { vec_heap.emplace_back(ptr[x]); } ptr += m.w; } } } }2、Your own LayerNorm2d_Sc works the same with the original one. If your own LayerNorm2d_Sc works but fails in transfering to ncnn model. Maybe you can update the ncnn version and compile the layernorm operation (see #5262 (comment) for detail). Could you post the error message?
I do these operations for getting ncnn: Pytorch model --> onnxsim --> ncnn. But I got "LayerNormalization not supported yet!" when turning it to ncnn
./onnx2ncnn model_trace_1.4M_512_sim.onnx test_ncnn.param test_ncnn.bin
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
LayerNormalization not supported yet!
# axis=-1
# epsilon=1e-06
The number of errors reported may correspond to the number of custom LayerNorm operations. In addition, I try to add LayerNorm in ncnn to implement the following:
// modified in src/layer/layernorm.cpp
else if (affine_size == channels)
{
#pragma omp parallel for num_threads(opt.num_threads)
for (int i = 0; i < size; i++)
{
// mean
float sum = 0.f;
for (int q = 0; q < channels; q++)
{
sum += bottom_top_blob.channel(q)[i];
}
float mean = sum / channels;
// var
float sqsum = 0.f;
float tmp = 0.f;
for (int q = 0; q < channels; q++)
{
tmp = bottom_top_blob.channel(q)[i] - mean;
sqsum += tmp * tmp;
}
float var = sqsum / channels;
float a = 1.f / (sqrtf(var + eps));
float b = -mean * a;
for (int q = 0; q < channels; i++)
{
bottom_top_blob.channel(q)[i] = bottom_top_blob.channel(q)[i] * a + b;
}
}
}
And execute the command under ncnn/build:
cmake ..
make -j64
make install```
When I turned onnx-sim file to ncnn, I got the same error above.
Thanks again for your reply, and I believe I can figure ncnn out with your help.^_^
Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.
Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.
I added the LayerNorm implementation of ncnn, why is it still not supported? It feels like the conversion process does not call ncnn’s LayerNorm.
Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.
I added the LayerNorm implementation of ncnn, why is it still not supported? It feels like the conversion process does not call ncnn’s LayerNorm.
1、I didn't try to register own op, but i think it should be a individual .h & .cpp file to declare the class LayerNormalization.
and then in /ncnn/src/CMakeLists.txt line 169 add ncnn_add_layer(LayerNormalization)
Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.
I added the LayerNorm implementation of ncnn, why is it still not supported? It feels like the conversion process does not call ncnn’s LayerNorm.
1、I didn't try to register own op, but i think it should be a individual .h & .cpp file to declare the class
LayerNormalization. and then in /ncnn/src/CMakeLists.txt line 169 addncnn_add_layer(LayerNormalization)
I have tried to supplement the LayerNorm implementation in ncnn, added the LayerNormalization implementation according to the reference document add custom layer and recompiled. When onnx is converted to ncnn, an error is still reported and the LayerNormalization operation is not supported. Did I compile it incorrectly? (The compilation process prompts "Could NOT find protobuf (missing: protobuf_DIR)", but subsequent execution of make, etc. can also succeed)
1.LayerNorm in ncnn surpports normalization by channel dim:
2.Added new LayerNormalization implementation in ncnn, but it doesn't seem to work.
if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.
if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.
I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.
if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.
I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.
yeah, i got the same situation, but dont know why it didn't work
if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.
I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.
yeah, i got the same situation, but dont know why it didn't work
Help, please. @nihui
if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.
I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.
yeah, i got the same situation, but dont know why it didn't work
Thanks again, I won't give up and solve this problem sooner or later. I must turn to ncnn, as it's perfect in my view.
I used PNNX to resolve my trouble in the end! Thanks @nihui for PNNX!
针对onnx模型转换的各种问题,推荐使用最新的pnnx工具转换到ncnn In view of various problems in onnx model conversion, it is recommended to use the latest pnnx tool to convert your model to ncnn
pip install pnnx
pnnx model.onnx inputshape=[1,3,224,224]
详细参考文档 Detailed reference documentation https://github.com/pnnx/pnnx https://github.com/Tencent/ncnn/wiki/use-ncnn-with-pytorch-or-onnx#how-to-use-pnnx