PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

V5 mobile 版本有时候返回繁体字

Open liuyunfei666 opened this issue 7 months ago • 12 comments

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

Image OCR结果:被誉为绿色林海的大兴安岭位於我国哪个省份|---|中心坐标:386,288 黑龙江|---|中心坐标:121,152 内蒙古|---|中心坐标:27,152 辽宁|---|中心坐标:26,513 吉林|---|中心坐标:121,511

🏃‍♂️ Environment (运行环境)

转onnx转ncnn安卓上面运行

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

TextLine getTextLine(const cv::Mat& src) { const int TARGET_WIDTH = 320; const int TARGET_HEIGHT = 48; const float TARGET_ASPECT_RATIO = (float)TARGET_WIDTH / TARGET_HEIGHT;

// 图像预处理(通道转换)
cv::Mat tempMatForProcessing;
if (src.channels() == 1) {
    cv::cvtColor(src, tempMatForProcessing, cv::COLOR_GRAY2BGR);
} else if (src.channels() == 4) {
    cv::cvtColor(src, tempMatForProcessing, cv::COLOR_BGRA2BGR);
} else if (src.channels() == 3) {
    tempMatForProcessing = src;
} else {
    __android_log_print(ANDROID_LOG_ERROR, "OCR_NCNN_getTextLine",
                        "Unsupported number of channels: %d", src.channels());
    return {"Error: Unsupported channels", {}};
}

int H = tempMatForProcessing.rows;
int W = tempMatForProcessing.cols;
if (H <= 0 || W <= 0) {
    __android_log_print(ANDROID_LOG_ERROR, "OCR_NCNN_getTextLine",
                        "Invalid image dimensions: H=%d, W=%d", H, W);
    return {"Error: Invalid image dimensions", {}};
}

// 计算输入图像的宽高比
float aspect_ratio = (float)W / H;

// 根据宽高比决定缩放策略
float scale;
int new_width, new_height;

if (aspect_ratio > TARGET_ASPECT_RATIO) {
    // 宽文本:以宽度为基准缩放,高度自适应
    scale = (float)TARGET_WIDTH / W;
    new_width = TARGET_WIDTH;
    new_height = cvRound(H * scale);
} else {
    // 高文本:以高度为基准缩放,宽度自适应
    scale = (float)TARGET_HEIGHT / H;
    new_height = TARGET_HEIGHT;
    new_width = cvRound(W * scale);
}

// 执行缩放,保持原始宽高比
cv::Mat resized_img;
cv::resize(tempMatForProcessing, resized_img, cv::Size(new_width, new_height),
           0, 0, cv::INTER_AREA); // 使用INTER_AREA适合缩小图像

// 创建白色背景画布
cv::Mat input_canvas(TARGET_HEIGHT, TARGET_WIDTH, CV_8UC3, cv::Scalar(127, 127, 127));

// 居中放置缩放后的图像
int x_offset = (TARGET_WIDTH - new_width) / 2;
int y_offset = (TARGET_HEIGHT - new_height) / 2;

// 确保偏移量非负
x_offset = std::max(x_offset, 0);
y_offset = std::max(y_offset, 0);

// 将图像复制到画布中央
cv::Rect roi(x_offset, y_offset, new_width, new_height);
resized_img.copyTo(input_canvas(roi));

// 转换为ncnn格式并归一化
ncnn::Mat input_ncnn = ncnn::Mat::from_pixels(input_canvas.data,
                                              ncnn::Mat::PIXEL_BGR,
                                              TARGET_WIDTH, TARGET_HEIGHT);

// 归一化参数(根据模型需求调整)
// 若模型期望输入范围是[0,1],使用以下参数
//const float mean_vals[3] = {0.0f, 0.0f, 0.0f};
//const float norm_vals[3] = {1.0f / 255.0f, 1.0f / 255.0f, 1.0f / 255.0f};

// 若模型期望输入范围是[-1,1],使用以下参数
const float mean_vals[3] = {127.5f, 127.5f, 127.5f};
const float norm_vals[3] = {1.0f / 127.5f, 1.0f / 127.5f, 1.0f / 127.5f};

input_ncnn.substract_mean_normalize(mean_vals, norm_vals);

// 通过CRNN网络提取文本
ncnn::Extractor extractor = crnnNet.create_extractor();
extractor.set_light_mode(true);
extractor.input("in0", input_ncnn);

ncnn::Mat out_mat;
extractor.extract("out0", out_mat);

// 后处理并返回结果
std::vector<float> outputData((float*)out_mat.data,
                              (float*)out_mat.data + out_mat.total());
return scoreToTextLine(outputData, out_mat.h, out_mat.w);

}

liuyunfei666 avatar Jun 01 '25 04:06 liuyunfei666

Image

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False, # 通过 use_doc_orientation_classify 参数指定不使用文档方向分类模型
    use_doc_unwarping=False, # 通过 use_doc_unwarping 参数指定不使用文本图像矫正模型
    use_textline_orientation=True, # 通过 use_textline_orientation 参数指定不使用文本行方向分类模型
    text_detection_model_name="PP-OCRv5_mobile_det", # 通过 text_detection_model_name 参数指定文本检测模型
    text_recognition_model_name="PP-OCRv5_mobile_rec", # 通过 text_recognition_model_name 参数指定文本识别模型
)
result = ocr.predict("./449693422-432c7ebe-b8b3-4766-a5b4-832257206610.png")
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

直接用python api 没啥问题

GreatV avatar Jun 01 '25 05:06 GreatV

Image

from paddleocr import PaddleOCR

ocr = PaddleOCR( use_doc_orientation_classify=False, # 通过 use_doc_orientation_classify 参数指定不使用文档方向分类模型 use_doc_unwarping=False, # 通过 use_doc_unwarping 参数指定不使用文本图像矫正模型 use_textline_orientation=True, # 通过 use_textline_orientation 参数指定不使用文本行方向分类模型 text_detection_model_name="PP-OCRv5_mobile_det", # 通过 text_detection_model_name 参数指定文本检测模型 text_recognition_model_name="PP-OCRv5_mobile_rec", # 通过 text_recognition_model_name 参数指定文本识别模型 ) result = ocr.predict("./449693422-432c7ebe-b8b3-4766-a5b4-832257206610.png") for res in result: res.print() res.save_to_img("output") res.save_to_json("output")

直接用python api 没啥问题

请问是我模型转换的问题吗?我看你的出入口是"output",而我转换onnx后出入口变成"in0","out0",我觉得是我下载的模型版本属于你们的测试版,否则没理由刚好返回于的繁体,说明识别是正确的,输出的时候存在BUG,但我是在:https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/pipeline_usage/OCR.html#1-ocr中下载的

liuyunfei666 avatar Jun 01 '25 05:06 liuyunfei666

@liuyunfei666 https://github.com/nihui/ncnn-android-ppocrv5 这个工程参考下?

nihui avatar Jun 02 '25 15:06 nihui

确认下模型字典是对的吗

zhangyubo0722 avatar Jun 03 '25 08:06 zhangyubo0722

正认下模模模型字典是反成成成成成成成成 票

确认下模型字典是对的吗

我是直接提取yml里面的,应该没问题18383行的那个,他也不是一直繁体,就比如我把图片截图截大一点点尺寸或者缩小一点点尺寸就变成简体了,很奇怪的现象,他返回繁体说明识别没问题,否则应该是其他错别字,而非正确字的繁体形态

liuyunfei666 avatar Jun 03 '25 08:06 liuyunfei666

2025-06-05 17:46:33.343 8718-9618 MyForegroundService com.android.xuetr D Decrypted JSON request: {"color_invert":0,"image_path":"/sdcard/test.png","mode":1} ### 2025-06-05 17:46:33.720 8718-9618 NCNN_OCR_PROB_TRACE com.android.xuetr E T:60, MaxIdx:25(于), MaxProb:0.6910 || 于(idx 25):0.6910, 於(idx 6698):0.3089 2025-06-05 17:46:33.830 8718-9618 MyForegroundService com.android.xuetr D Sending response: {"time_ms":477,"results":[{"label":"被誉为绿色林海的大兴安岭位于我国哪个省份","score":0.846509575843811,"x0":407,"y0":2,"x1":407,"y1":572,"x2":369,"y2":572,"x3":369,"y3":2,"x4":388,"y4":287},{"label":"黑龙江","score":0.835160493850708,"x0":142,"y0":104,"x1":142,"y1":197,"x2":104,"y2":197,"x3":104,"y3":104,"x4":123,"y4":150},{"label":"吉林","score":0.8406929969787598,"x0":146,"y0":476,"x1":146,"y1":543,"x2":104,"y2":543,"x3":104,"y3":476,"x4":125,"y4":509},{"label":"内蒙古","score":0.8454322218894958,"x0":48,"y0":103,"x1":48,"y1":195,"x2":9,"y2":195,"x3":9,"y3":103,"x4":28,"y4":149},{"label":"辽宁","score":0.8417908549308777,"x0":50,"y0":477,"x1":50,"y1":544,"x2":8,"y2":544,"x3":8,"y3":477,"x4":29,"y4":510}]} 2025-06-05 17:46:34.836 8718-9618 MyForegroundService com.android.xuetr I Client disconnected: /127.0.0.1 2025-06-05 17:46:55.889 8718-8736 MyForegroundService com.android.xuetr I Client connected: /127.0.0.1 2025-06-05 17:46:55.889 8718-8736 MyForegroundService com.android.xuetr D TCP Server waiting for client connection... 2025-06-05 17:46:55.993 8718-9689 MyForegroundService com.android.xuetr D Decrypted JSON request: {"color_invert":0,"image_path":"/sdcard/test.png","mode":1} ### 2025-06-05 17:46:56.281 8718-9689 NCNN_OCR_PROB_TRACE com.android.xuetr E T:19, MaxIdx:6698(於), MaxProb:0.7749 || 于(idx 25):0.2250, 於(idx 6698):0.7749 2025-06-05 17:46:56.384 8718-9689 MyForegroundService com.android.xuetr D Sending response: {"time_ms":382,"results":[{"label":"五台山位於我国哪个省?","score":0.8455487489700317,"x0":408,"y0":3,"x1":408,"y1":309,"x2":369,"y2":309,"x3":369,"y3":3,"x4":388,"y4":156},{"label":"河北","score":0.8459494113922119,"x0":145,"y0":116,"x1":145,"y1":183,"x2":104,"y2":183,"x3":104,"y3":116,"x4":124,"y4":149},{"label":"山东","score":0.8422250747680664,"x0":143,"y0":476,"x1":143,"y1":542,"x2":103,"y2":542,"x3":103,"y3":476,"x4":123,"y4":509},{"label":"河南","score":0.8379645943641663,"x0":50,"y0":116,"x1":50,"y1":182,"x2":8,"y2":182,"x3":8,"y3":116,"x4":29,"y4":149},{"label":"山西","score":0.8162077069282532,"x0":49,"y0":477,"x1":49,"y1":544,"x2":8,"y2":544,"x3":8,"y3":477,"x4":28,"y4":510}]} 2025-06-05 17:46:57.388 8718-9689 MyForegroundService com.android.xuetr I Client disconnected: /127.0.0.1

Image

Image

liuyunfei666 avatar Jun 05 '25 09:06 liuyunfei666

参考其他issue的讨论,应该是训练数据的问题

GreatV avatar Jun 05 '25 09:06 GreatV

由于 PP-OCRv5 同时要支持简体中文和繁体中文,因此在训练时使用了很多数据挖掘策略,包括基于大模型的数据蒸馏、数据合成等,其中偶尔会出现简体文字使用的繁体标签、繁体文字使用的简体标签的情况,所以在识别效果上偶尔也会出现这种简繁体识别有误,后续也会发布PP-OCRv5的更新版本,避免这种情况的发生

zhangyubo0722 avatar Jun 05 '25 09:06 zhangyubo0722

由于 PP-OCRv5 同时要支持简体中文和繁体中文,因此在训练时使用了很多数据挖掘策略,包括基于大模型的数据蒸馏、数据合成等,其中偶尔会出现简体文字使用的繁体标签、繁体文字使用的简体标签的情况,所以在识别效果上偶尔也会出现这种简繁体识别有误,后续也会发布PP-OCRv5的更新版本,避免这种情况的发生

Image

好的,难怪如此,而且我发现一个现象,比如这张图,它会把魏识别成委鬼,但是如果我把该图的上部分提升1个像素点的高度,它就能准确识别成魏! 这引起了我的好奇心,难道是尺寸问题?

所以我不提升上面的一位像素点,改成下降下面的一位像素点,使其高度和能正确识别的尺寸一样,但是发现还是识别错误成委鬼,我继续下拉高度,发现毫无作用,所以我感觉长文本位于第一序列识别好像能影响到下面的单字文本

liuyunfei666 avatar Jun 05 '25 18:06 liuyunfei666

由于 PP-OCRv5 同时要支持简体中文和繁体中文,因此在训练时使用了很多数据挖掘策略,包括基于大模型的数据蒸馏、数据合成等,其中偶尔会出现简体文字使用的繁体标签、繁体文字使用的简体标签的情况,所以在识别效果上偶尔也会出现这种简繁体识别有误,后续也会发布PP-OCRv5的更新版本,避免这种情况的发生

我想请问一下,简体繁体扰乱的修复是在v5基础上修复,还是等以后推出v6?

liuyunfei666 avatar Jun 07 '25 05:06 liuyunfei666

在v5的基础上修复

zhangyubo0722 avatar Jun 13 '25 03:06 zhangyubo0722

The issue has no response for a long time and will be closed. You can reopen or new another issue if are still confused.


From Bot

TingquanGao avatar Jul 14 '25 12:07 TingquanGao

请问发布了 吗

249189594 avatar Oct 02 '25 07:10 249189594

同问,发布了吗

liwenju0 avatar Oct 24 '25 23:10 liwenju0