Error decode UTF-8 character 'â'
I have a problem when I try using pyzbar to decode a QR image. But I had given result don't match data which I using qrcode make before. this is my code:
from qreader import QReader from PIL import Image import qrcode
image_path = "my_image.png" data = 'â' print(f'data = {data}') img = qrcode.make(data)
img.save(image_path) img = cv2.imread(image_path) result = qreader.detect_and_decode(image=img) print(f"result = {result[0]}")
Hi, how are you initializing qreader here? qreader.detect_and_decode(image=img)
I have run your piece of code, just instantiating it as qreader = QReader() and gives me the correct result.
data = â
result = â
I have been exploring with the debugger and I have detected that, intermediately, pyzbar decodes an incorrect character ('テ「') with utf-8
However, when you instantiate
QReader with its default reencode_to value, it automatically solves it:
I think that it should only fail to decode that character if you initialize it as QReader(reencode_to='utf-8') or QReader(reencode_to=None).
If that's not the case, could you give me more information to try to replicate the error?
- Are you running latest version?
- Which OS are you running?
Hi, @Eric-Canas I am using
- OS Ubuntu 22.04.1 LTS.
- qreader 3.11
- python 3.10.12
This is my result :
I have been trying to replicate the error in Windows, Amazon Linux and Ubuntu 22.04, and I have not been able to reproduce it :(
The error should be replicable by running:
>>> 'テ「'.encode('shift-jis').decode('utf-8')
'â'
Does this code also breaks for you?
(Amazon Linux 2023)
(Ubuntu)
My best guess is that It must be related with regional configuration of the OS, but I can not ensure that as I have not been able to replicate the error :(
The problem is related to how python encode and decode plain strings with special characters. As that's the line that is giving you the warning:
'テ「'.encode('shift-jis').decode('utf-8')
I have trying my code in the google colab and given result the same on my computer.
And I have checked result (b'\x8e\xa3' ) of pyzbar my program had different your result (b'\xc3\xa2') :
Hi!
Sorry for the inconvenience, I oversimplified the error. I have been researching it thanks to your Google Colab, and I found that problem was that Windows and Linux does not use the same decoding. So, while default "utf-8" pyzbar decoding was 'テ「' for Windows, it was '璽' for Linux.
I did a large experimentation of shift-jis vs other encodings, and "Big5" is the one that gave me the correct decoding results for all characters on Linux systems, as shift-jis was for Windows systems (It gives same decoding that shift-jis for all cases where shift-jis works, and correct results for those cases where it fails on Linux).
I have uploaded an update that selects one or the other encoding as default, depending on your OS ("Big5" fails on a lot of characters on Windows :( ). I have tested it on your Google Colab, and that's producing expected results now.
You can upgrade it by pip install --upgrade qreader. Previous version should still work if you instantiate QReader as QReader(reencode_to="big5")
Thanks a lot for your warning!
Hi @Eric-Canas, I have checked your solution and one that gave correct decoding results on my computer. Thanks your supporting.
Hi @Eric-Canas ,
I have check QReader(reencode_to="big5") with character 'â' then gaven correct result. When i have checked lagre data with QReader(reencode_to="big5") then I have many same error. there my code anh data :
import json
from qreader import QReader from PIL import Image import qrcode import cv2
image_path = "my_image.png"
qreader = QReader(model_size='n',reencode_to='big5') json_file = open('uit_member.json', 'r') data = json.load(json_file) j = 0 len_ = 0
for i in data: len_ += 1 name = i["full_name"] img = qrcode.make(name) img.save(image_path) img = cv2.imread(image_path) result = qreader.detect_and_decode(image=img) if name != result[0]: j+= 1 print(f"{j*100/len_}% data {name} result = {result[0]} ")
Hi!
Thanks for your test data. I'm still testing, it seems that there are some entries quite difficult to decode. By the moment I can tell you that most of your errors should dissapear this way:
QReader(reencode_to=('big5', 'shift-jis', 'latin1'))
But not all of them.
To easily replicate the error, there should be a way to decode
b'L\xef\xbe\x83\xef\xbd\xaa Anh S\xef\xbe\x86\xef\xbd\xa1n'
as
Lê Anh Sơn
But I can't find any charset that works. That's the direct byte detection pyzbar gets from the qr generated by qrcode for this entry. And I can't find any single nor double encoding way of decoding it correctly.
Sorry, I'll update you if a find an alternative.
Hi, i same issue. Actually the phrase in my QR is: Vĩnh Phong, Vĩnh Bảo, Hải Phòng When using the library I get: V藺nh Phong, V藺nh B廕υ, H廕ξ Ph簷ng
Hi, Did someone solve this problem or have any approach to handle this case ? Thank you!
Hello i have a same issue . When i scan qrcode on my card id . the correct text must be : HUỲNH HIẾU THUẬN but the text I received was : Hu廙軟h Hi廕簑 Thu廕要 I have read your documentation and edited the reencode_to parameters but it doesn't seem to work for me. My languages is vietnamese And this is my code:
def read_qr_code_2(image_path):
# Create a QReader instance
qreader = QReader(model_size = 's', min_confidence = 0.5, reencode_to = 'utf-8')
# Get the image that contains the QR code
image = cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB)
# Use the detect_and_decode function to get the decoded QR data
decoded_text = qreader.detect_and_decode(image=image)
print(decoded_text)