Unicode support
Currently the driver has two issues with Unicode functions:
- They are just front-end to ANSI functions that defeats their purpose: ability to work with symbols outside of ANSI codepage.
- A couple of bugs in ANSI<->Unicode conversion code assumes that
SQLWCHARiswchar_tandsizeof(wchar_t)is 2. The former makes the driver non-working with unixODBC and the latter - with iODBC.
I see following solutions for it:
- Drop charset option and fix the connection charset to NONE, performing conversion between storage charset and ANSI/Unicode on the client side. It will cause problems with server error messages and string literals in queries effectively limiting them to ASCII.
- Automatically set connection charset to match ANSI codepage (current locale) for
SQL*Connectfunctions and fix it to UTF-8 forSQL*ConnectWfunctions. It will solve problems above but cause a new one with applications that use ANSI connection while still want to work with Unicode data usingSQLGetData. - Set connection charset to UTF-8 unconditionally and perform ANSI/Unicode conversions on the client side. No problems above but if the database charset is not UTF-8, a server-side conversion is enforced causing a higher load. Besides, it requires more changes in codebase.
So, no perfect solution. Or may be anyone see one?..
Don't the xxxW functions expect UTF-16? If so, the whole doing things with UTF8 is a red herring.
What needs to be done, irrespective of the connection character set or (when using NONE) the individual column character sets is convert from that character set to UTF16. Columns that are explicitly NONE are - as always - extremely problematic.
Don't the xxxW functions expect UTF-16?
Yes for Windows and unixODBC. No for iODBC. Unicode encoding in specification is not actually specified IIRC.
What needs to be done, irrespective of the connection character set or (when using NONE)
Data is not a problem in this case. Error messages and query text are.
ODBC specifies UCS-2 (a.k.a UTF-16), see https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode?view=sql-server-ver16 ("Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character.")
Also relevant: https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode-data?view=sql-server-ver16
Right
вт, 8 апр. 2025 г., 08:43 Mark Rotteveel @.***>:
ODBC specifies UCS-2 (a.k.a UTF-16), see https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode?view=sql-server-ver16 ("Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character.")
— Reply to this email directly, view it on GitHub https://github.com/FirebirdSQL/firebird-odbc-driver/issues/244#issuecomment-2785286040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZQLQ2QIUCELXNXRAMGE32YNOZRAVCNFSM6AAAAAB2T7OAJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVGI4DMMBUGA . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: mrotteveel]mrotteveel left a comment (FirebirdSQL/firebird-odbc-driver#244) https://github.com/FirebirdSQL/firebird-odbc-driver/issues/244#issuecomment-2785286040
ODBC specifies UCS-2 (a.k.a UTF-16), see https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode?view=sql-server-ver16 ("Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character.")
— Reply to this email directly, view it on GitHub https://github.com/FirebirdSQL/firebird-odbc-driver/issues/244#issuecomment-2785286040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZQLQ2QIUCELXNXRAMGE32YNOZRAVCNFSM6AAAAAB2T7OAJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVGI4DMMBUGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
That makes the Unicode part of the driver completely broken on Linux where wchar_t is UCS4 aka UTF-32.
Ok, if nobody has a better idea I'll try to fix the driver for awhile using hardcoded UTF-8 charset for SQL*ConnectW functions and performing conversions accordingly.
Hi @aafemt
You're welcome to propose PR to fix this issue. I would appreciate it. If you face any difficulties I'll be glad to help you. Well, I was ready to dig it myself - but if want to contribute, I'll be happy to give way))
Regards
What needs to be done, irrespective of the connection character set or (when using NONE) the individual column character sets is convert from that character set to UTF16. Columns that are explicitly NONE are - as always - extremely problematic.
@mrotteveel, would you mind pointing me to some test cases that demonstrate how to handle these scenarios correctly? It could be another repository, I just want to see the expected / edge cases.
There is FBEncodingsTest and its subclass FBLongVarCharEncodingsTest in Jaybird, but to be honest, those are tests with 23 years of history, and they are not really good tests IMHO, but I've never gotten around to rewriting or replacing them.
2. A couple of bugs in ANSI<->Unicode conversion code assumes that
SQLWCHARiswchar_tandsizeof(wchar_t)is 2. The former makes the driver non-working with unixODBC and the latter - with iODBC.
@aafemt I'm currently gathering a few edge cases for a future test suite. Would you be able to share some specific examples of this issue you're encountering?
@fdcastel Well... The most obvious and simple example: iusql plainly cannot connect to Firebird ODBC sources providing empty error. Every Unicode function just doesn't work because of usage mbstowcs() and assumption that sizeof(wchar_t) is two.