mlkit icon indicating copy to clipboard operation
mlkit copied to clipboard

Barcode scanning fails with "Unknown encoding" for ISO-8859-1 encoded data matrix

Open dspoeri opened this issue 4 years ago • 8 comments

The official German medication plan data matrix ("BMP", Bundeseinheitlicher Medikationsplan) expects data to be encoded with ISO-8859-1. If the data contains a German umlaut, Google Vision barcode scanning fails with an "Unknown encoding" error.

Scanning the following data matrix reproduces the bug: barcode

This bug sadly renders Google Vision barcode scanning useless for the mentioned use case.

Two suggested solutions:

  • accept other encodings than ASCII and UTF-8
  • provide access to the raw data through a byte array

dspoeri avatar Jan 10 '21 18:01 dspoeri

Just saw this and it is similar to an issue reported last year . I commented on that here https://github.com/googlesamples/mlkit/issues/44#issuecomment-632303060 Unfortunately meant the library just didnt work for our use cases.... Its a shame as its an excellent library otherwise....

I agree with the suggested solutions... it would be wonderful for the library to either support the ISO-8859-1 characterset as an option. Or else to provide access to the scanned data as a byte array without going through any character set conversions... Both options would allow reading of all barcodes

I noticed there was some new version com.google.firebase:firebase-ml-vision-barcode-model:16.1.2 released later in 2020 but havent had time to see if these provided that access...

GarryKelly avatar Jan 12 '21 15:01 GarryKelly

At least com.google.mlkit:barcode-scanning:16.1.0 contains barcode.rawBytes

Returns raw bytes as it was encoded in the barcode. Returns null if the raw bytes can not be determined.

so I think you can make return String(barcode.rawBytes, StandardCharsets.ISO_8859_1)

ivan200 avatar Jan 13 '21 10:01 ivan200

At least com.google.mlkit:barcode-scanning:16.1.0 contains barcode.rawBytes

Returns raw bytes as it was encoded in the barcode. Returns null if the raw bytes can not be determined.

so I think you can make return String(barcode.rawBytes, StandardCharsets.ISO_8859_1)

It doesn't help: rawBytes returns an array with 16 bytes representing the string Unknown encoding.

dspoeri avatar Jan 13 '21 14:01 dspoeri

Hi, we are working on a fix internally.

cs-googler avatar Apr 05 '21 17:04 cs-googler

so @cs-googler how is the internal fix going? How about letting the user specify the encoding via BarcodeScannerOptions?

pke avatar Mar 14 '23 15:03 pke