QReader icon indicating copy to clipboard operation
QReader copied to clipboard

Error decode UTF-8 character 'â'

Open tamtr1997 opened this issue 2 years ago • 15 comments

I have a problem when I try using pyzbar to decode a QR image. But I had given result don't match data which I using qrcode make before. this is my code:

from qreader import QReader from PIL import Image import qrcode

image_path = "my_image.png" data = 'â' print(f'data = {data}') img = qrcode.make(data)

img.save(image_path) img = cv2.imread(image_path) result = qreader.detect_and_decode(image=img) print(f"result = {result[0]}")

tamtr1997 avatar Oct 11 '23 09:10 tamtr1997

Hi, how are you initializing qreader here? qreader.detect_and_decode(image=img)

I have run your piece of code, just instantiating it as qreader = QReader() and gives me the correct result.

data = â
result = â

I have been exploring with the debugger and I have detected that, intermediately, pyzbar decodes an incorrect character ('テ「') with utf-8 image However, when you instantiate QReader with its default reencode_to value, it automatically solves it:

image image

I think that it should only fail to decode that character if you initialize it as QReader(reencode_to='utf-8') or QReader(reencode_to=None).

If that's not the case, could you give me more information to try to replicate the error?

  • Are you running latest version?
  • Which OS are you running?

Eric-Canas avatar Oct 11 '23 17:10 Eric-Canas

Hi, @Eric-Canas I am using

  • OS Ubuntu 22.04.1 LTS.
  • qreader 3.11
  • python 3.10.12

This is my result : image

tamtr1997 avatar Oct 12 '23 00:10 tamtr1997

I have been trying to replicate the error in Windows, Amazon Linux and Ubuntu 22.04, and I have not been able to reproduce it :(

The error should be replicable by running:

>>> 'テ「'.encode('shift-jis').decode('utf-8')
'â'

Does this code also breaks for you?

(Amazon Linux 2023) image

(Ubuntu) image

My best guess is that It must be related with regional configuration of the OS, but I can not ensure that as I have not been able to replicate the error :(

The problem is related to how python encode and decode plain strings with special characters. As that's the line that is giving you the warning:

'テ「'.encode('shift-jis').decode('utf-8') image

Eric-Canas avatar Oct 12 '23 09:10 Eric-Canas

I have trying my code in the google colab and given result the same on my computer.

image

And I have checked result (b'\x8e\xa3' ) of pyzbar my program had different your result (b'\xc3\xa2') : image

tamtr1997 avatar Oct 13 '23 02:10 tamtr1997

Hi!

Sorry for the inconvenience, I oversimplified the error. I have been researching it thanks to your Google Colab, and I found that problem was that Windows and Linux does not use the same decoding. So, while default "utf-8" pyzbar decoding was 'テ「' for Windows, it was '璽' for Linux.

I did a large experimentation of shift-jis vs other encodings, and "Big5" is the one that gave me the correct decoding results for all characters on Linux systems, as shift-jis was for Windows systems (It gives same decoding that shift-jis for all cases where shift-jis works, and correct results for those cases where it fails on Linux).

I have uploaded an update that selects one or the other encoding as default, depending on your OS ("Big5" fails on a lot of characters on Windows :( ). I have tested it on your Google Colab, and that's producing expected results now.

You can upgrade it by pip install --upgrade qreader. Previous version should still work if you instantiate QReader as QReader(reencode_to="big5")

Thanks a lot for your warning!

Eric-Canas avatar Oct 13 '23 09:10 Eric-Canas

Hi @Eric-Canas, I have checked your solution and one that gave correct decoding results on my computer. Thanks your supporting.

tamtr1997 avatar Oct 16 '23 01:10 tamtr1997

Hi @Eric-Canas ,

I have check QReader(reencode_to="big5") with character 'â' then gaven correct result. When i have checked lagre data with QReader(reencode_to="big5") then I have many same error. there my code anh data :

import json

from qreader import QReader from PIL import Image import qrcode import cv2

image_path = "my_image.png"

qreader = QReader(model_size='n',reencode_to='big5') json_file = open('uit_member.json', 'r') data = json.load(json_file) j = 0 len_ = 0

for i in data: len_ += 1 name = i["full_name"] img = qrcode.make(name) img.save(image_path) img = cv2.imread(image_path) result = qreader.detect_and_decode(image=img) if name != result[0]: j+= 1 print(f"{j*100/len_}% data {name} result = {result[0]} ")

tamtr1997 avatar Oct 17 '23 09:10 tamtr1997

Hi!

Thanks for your test data. I'm still testing, it seems that there are some entries quite difficult to decode. By the moment I can tell you that most of your errors should dissapear this way:

QReader(reencode_to=('big5', 'shift-jis', 'latin1'))

But not all of them.

To easily replicate the error, there should be a way to decode b'L\xef\xbe\x83\xef\xbd\xaa Anh S\xef\xbe\x86\xef\xbd\xa1n' as Lê Anh Sơn

But I can't find any charset that works. That's the direct byte detection pyzbar gets from the qr generated by qrcode for this entry. And I can't find any single nor double encoding way of decoding it correctly.

Sorry, I'll update you if a find an alternative.

Eric-Canas avatar Oct 18 '23 08:10 Eric-Canas

Hi, i same issue. Actually the phrase in my QR is: Vĩnh Phong, Vĩnh Bảo, Hải Phòng When using the library I get: V藺nh Phong, V藺nh B廕υ, H廕ξ Ph簷ng

tranvannhat avatar Oct 24 '23 08:10 tranvannhat

Hi, Did someone solve this problem or have any approach to handle this case ? Thank you!

congdaoduy298 avatar Jan 30 '24 02:01 congdaoduy298

Hello i have a same issue . When i scan qrcode on my card id . the correct text must be : HUỲNH HIẾU THUẬN but the text I received was : Hu廙軟h Hi廕簑 Thu廕要 I have read your documentation and edited the reencode_to parameters but it doesn't seem to work for me. My languages is vietnamese And this is my code:

def read_qr_code_2(image_path):
    # Create a QReader instance
    qreader = QReader(model_size = 's', min_confidence = 0.5, reencode_to = 'utf-8')

    # Get the image that contains the QR code
    image = cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB)

    # Use the detect_and_decode function to get the decoded QR data
    decoded_text = qreader.detect_and_decode(image=image)
    print(decoded_text)

quyet12308 avatar Mar 08 '24 08:03 quyet12308