Changes in encoding
Hello,
Is there a way to automatically decode data using cp1252?. I am connecting to both Oracle and Access and they are both sending cp1252 encoded strings. For now I am using the Convert() function in Oracle, Access on the other hand was working fine before I switch to the new refactored ODBC.jl. I know I can use StringsEncodings.jl to decode the strings but I was wondering if there was an option to tell the API which encoding to expect from server.
Any help would be appreciated,
Best regards
Sorry for the slow response here; yeah, we've had a number of issues over the years w/ encodings, mostly bugs, so in the current 1.0 version, it relies heavily on all strings being UTF8. I'd have to dig in to see if there's a way we could label strings coming out as a certain encoding. Part of the issue is that String in Julia expects its bytes to be UTF8, though it allows invalid UTF8 and you can convert. Could you explain a little more what you're doing? Like, what does your query look like? I'll have ot dig into where we do the string processing and see if there's a way to "hook" into that to bypass converting.
Thanks for the reply!
The queries I am using are as simple as they can be, everything is returned as expected except for strings. I have used pyodbc since then to retrieve data from the databases. It looks like the magic is happening here: pyodbc src
That's the best insight I can give right now as I am not too familiar with ODBC specifications or Julia's string handling mechanisms.
I am still looking for a way to get rid of python dependencies in my julia scripts.
Hello again,
I think there are multiple occasions in the code where the conversion might go wrong if the source encoding is somewhat similar to utf-8.
The API uses the transcode() for column names ( str function ) and the dbInterface assumes the input is encoded in utf-8 or a general julian String (the jlcast function ). Finally the function cwstring also relies on transcode() for inputs.
From what I gather, I think the python pyodbc package aforementioned uses the iconv library to overcome encoding specifics.
There might be a way to hook into the StringEncodings.jl package wich already created the iconv bindings. This would allow a proper handling of specific local encodings before they are delivered to julia in the UTF-8 format.
I guess there is a cost here to do the transcoding part but it might be an acceptable trade-off for those who want to play with strings and databases.