gxpy icon indicating copy to clipboard operation
gxpy copied to clipboard

GDB files containing latin-1 characters cannot be opened

Open mplough-kobold opened this issue 1 year ago • 5 comments

We often receive GDB files from contractors that contain data descriptions such as µT, but the descriptions are encoded using latin1. As a result, the µ character is encoded as 0xB5 rather than 0xC2 0xB5 as it would be in UTF-8.

Since Python strings are Unicode and gxpy loads strings in the default fashion, we end up with errors like this:

'utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte

Handling other character encodings (or perhaps only latin1 if that's what GDB uses) would allow us to open these files with gxpy.

mplough-kobold avatar Aug 22 '24 18:08 mplough-kobold

Yes, I have seen this error recently too.

RichardScottOZ avatar Nov 09 '24 03:11 RichardScottOZ

@mplough-kobold Can you share the sample gdb with the ISO/IEC 8859-1 encoding?

serban-seeq avatar Nov 11 '24 19:11 serban-seeq

@serban-seeq From SIGÉOM, see https://gq.mines.gouv.qc.ca/documents/EXAMINE/GM67278/. In GM67278_1_CD1.ZIP there exists a file called Deborah_Lake.gdb. The file is too large to upload here but the source is publicly available.

mplough-kobold avatar Nov 12 '24 19:11 mplough-kobold

@mplough-kobold Can you say in which step you're seeing this error? I'm being able to Load the Debora_lake.gb and print the unit of the channel tauBx10_30, which is µT, by using the following commands:

    with gxdb.Geosoft_gdb.open('Deborah_Lake') as gdb:
            unit = gxdb.Channel(gdb, 'tauBx10_30').unit_of_measure
            print ('Unit - {}  '.format(unit))

EricRoma avatar Dec 04 '24 20:12 EricRoma

GXDEV-50

serban-seeq avatar Jan 13 '25 19:01 serban-seeq

Haven't received sample in order to reproduce. can reopen in the future but closing for now

serban-seeq avatar Jun 20 '25 14:06 serban-seeq