opencv_contrib icon indicating copy to clipboard operation
opencv_contrib copied to clipboard

wechat qrcode UnicodeDecodeError in python3.6.9

Open frotms opened this issue 3 years ago • 10 comments

I modified repository of opencv/opencv-python and got .whl with wechat_qrcode. UnicodeDecodeError happens in some qrcode-images.

Env: python3.6.9 unbuntu18.04

Code:

detector = cv2.wechat_qrcode_WeChatQRCode("detect.prototxt", "detect.caffemodel", "sr.prototxt", "sr.caffemodel")
img = cv2.imread("img.jpg")
res, points = detector.detectAndDecode(img)

And I got information below: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 2: invalid continuation byte

frotms avatar Feb 02 '21 09:02 frotms

Hi @frotms , could you give me an example qrcode image to reproduce this bug?

dddzg avatar Feb 08 '21 11:02 dddzg

@dddzg UnicodeDecodeError when feeds images below:

0_0305110905217 0_0305110907851 0_0305110927865 0_0305110940264 1_0305110859069 1_0305110907877

frotms avatar Feb 08 '21 14:02 frotms

The problem may come from the C++ to python-binding.

Here is a decode result from C++ API:

unicode ���ص�ַ�� http://img.yingyonghui.com/apk/3188/com.sanlian.1300354632220.apk

dddzg avatar Feb 20 '21 07:02 dddzg

problem still persists with the image above (all ok from c++, but the python version throws)

  File "C:\p\opencv_contrib\modules\wechat_qrcode\samples\qrcode.py", line 34, in <module>
    res, points = detector.detectAndDecode(img)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

berak avatar Apr 29 '21 14:04 berak

problem still persists with the image above (all ok from c++, but the python version throws)

  File "C:\p\opencv_contrib\modules\wechat_qrcode\samples\qrcode.py", line 34, in <module>
    res, points = detector.detectAndDecode(img)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

Any solutions to this problem? I accounter with the same thing using the python api

Miracle2333 avatar Aug 06 '21 06:08 Miracle2333

I also encountered this unicode decode error. The interesting thing is when using non-cnn version even the python api works (sort of. return useless region, the whole image). But the cnn version raises exeception for some qrcodes like the small white qr in the attached image. I have the 4 files in place. This is not a file not existing error.

cv2.wechat_qrcode_WeChatQRCode("detect.prototxt", "detect.caffemodel", "sr.prototxt", "sr.caffemodel") -> exception vs cv2.wechat_qrcode_WeChatQRCode() -> ok but returned rectangle is always whole image.

wechat-qr

Just notice this issue is tagged as incomplete. So I add steps to reproduce. @alalek

environment: Ubuntu 18.04 python 3.6.9, opencv-contrib 4.5.2.52, 4.5.2.54, 4.5.3.56. Ubuntu 20.04 python 3.8.10, opencv-contrib 4.5.3.56.

  1. pip install opencv-contrib-python
  2. download 4 files from https://github.com/WeChatCV/opencv_3rdparty/tree/wechat_qrcode
  3. download https://user-images.githubusercontent.com/65517557/132307335-3fd58ff8-d6f2-42e8-bce0-2c69f927ca2d.png as img.png
  4. run the following lines detector = cv2.wechat_qrcode_WeChatQRCode("detect.prototxt", "detect.caffemodel", "sr.prototxt", "sr.caffemodel") img = cv2.imread("img.png") res, points = detector.detectAndDecode(img)

expected behavior: no exception, points is not [[0.,0.],[1919.,0.],[1919.,1079.],[0.,1079.]] actual behavior: UnicodeDecodeError

edit: I managed to compile opencv from source. I confirm that the c/c++ version don't crash. It turns out that the actual encoding of the white qr is gbk. And the exeception was raised when trying to decode as utf8.

miba2020 avatar Sep 07 '21 08:09 miba2020

The problem may come from the C++ to python-binding.

Here is a decode result from C++ API:

unicode ���ص�ַ�� http://img.yingyonghui.com/apk/3188/com.sanlian.1300354632220.apk

The encoding of your qr code is also gbk. There is a function guessEncoding. Can you print the encoding that the cpp code returned? Maybe it is wrong in the cpp code too.

I tried but can not figure out what the call stack is and where to put a break point.

I wish the python-binding just return raw bytes, i.e. in your case returns b'\xcf\xc2\xd4\xd8\xb5\xd8\xd6\xb7\xa3\xba'. It is much easier to figure out what is the actual encoding in python. @dddzg

miba2020 avatar Sep 13 '21 07:09 miba2020

The root cause is so called HANZI mode. This mode is not a part of ISO/IEC 18004. Instread, it is defined in Chinese standard ISO/IEC 18004. Many open-sourced libraries like zxing, segno, etc., support this mode. Codes encoded in this mode is as compact as those encoded in KANJI mode. So they are not uncommon, in China.

I found a fix to this. Do not blindly discard zxing::Result::charset_, but convert the text to utf-8 encoding when necessary.

--- a/contrib/modules/wechat_qrcode/src/decodermgr.cpp
+++ b/contrib/modules/wechat_qrcode/src/decodermgr.cpp
@@ -6,6 +6,8 @@
// Copyright (C) 2020 THL A29 Limited, a Tencent company. All rights reserved.
#include "precomp.hpp"
#include "decodermgr.hpp"
+#include <string>
+#include <iconv.h>


using zxing::ArrayRef;
@@ -46,6 +48,15 @@ int DecoderMgr::decodeImage(cv::Mat src, bool use_nn_detector, string& result) {
        int ret = TryDecode(source, zx_result);
        if (!ret) {
            result = zx_result->getText()->getText();
+            std::string charset = zx_result->getCharset();
+            if (charset == "GB2312") {
+                iconv_t cd = iconv_open("UTF-8", "GBK");
+                enum { max_gb_char = (int) (10208 / 13) + 1 };
+                char buf[max_gb_char], *out = buf, *in = const_cast<char*>(result.c_str());
+                size_t n_in = result.size(), n_out = max_gb_char;
+                iconv(cd, &in, &n_in, &out, &n_out);
+                *out = 0;
+                result = buf;
+            }
            return ret;
        }
        // try different binarizers

Edit: It seems zxing::qrcode::DecodedBitStreamParser::append could have handled this, but the code branch is disabled by a macro in zxing.hpp.

#ifndef NO_ICONV_INSIDE
#define NO_ICONV_INSIDE
#endif

miba2020 avatar Dec 09 '21 07:12 miba2020

Thanks @miba2020. I was inspired by your solution, but I found that the charset is not detected by default. I add some code to actively detect the charset based on git tag 4.6.0, and it works for me. The fix is also in a new branch of my fork. Here's the code to fix the error:

diff --git a/modules/wechat_qrcode/src/decodermgr.cpp b/modules/wechat_qrcode/src/decodermgr.cpp
index 06706eed..6214ad29 100644
--- a/modules/wechat_qrcode/src/decodermgr.cpp
+++ b/modules/wechat_qrcode/src/decodermgr.cpp
@@ -6,6 +6,9 @@
 // Copyright (C) 2020 THL A29 Limited, a Tencent company. All rights reserved.
 #include "precomp.hpp"
 #include "decodermgr.hpp"
+#include "zxing/common/stringutils.hpp"
+#include <string>
+#include <iconv.h>


 using zxing::ArrayRef;
@@ -46,6 +49,18 @@ int DecoderMgr::decodeImage(cv::Mat src, bool use_nn_detector, string& result) {
         int ret = TryDecode(source, zx_result);
         if (!ret) {
             result = zx_result->getText()->getText();
+            zxing::common::StringUtils zx_su;
+            std::string charset = zx_su.guessEncoding(const_cast<char*>(result.c_str()),
+                    static_cast<int>(result.length()));
+            if (charset == "GB2312") {
+                iconv_t cd = iconv_open("UTF-8", "GBK");
+                enum { max_gb_char = (int) (10208 / 13) + 1 };
+                char buf[max_gb_char], *out = buf, *in = const_cast<char*>(result.c_str());
+                size_t n_in = result.size(), n_out = max_gb_char;
+                iconv(cd, &in, &n_in, &out, &n_out);
+                *out = 0;
+                result = buf;
+            }
             return ret;
         }
         // try different binarizers
@@ -77,4 +92,4 @@ Ref<Result> DecoderMgr::Decode(Ref<BinaryBitmap> image, DecodeHints hints) {
     return reader_->decode(image, hints);
 }
 }  // namespace wechat_qrcode
-}  // namespace cv
\ No newline at end of file
+}  // namespace cv

zldrobit avatar Oct 28 '22 03:10 zldrobit

Please take a look at this PR: https://github.com/opencv/opencv/pull/24350

dkurt avatar Oct 02 '23 13:10 dkurt