firebird-odbc-driver icon indicating copy to clipboard operation
firebird-odbc-driver copied to clipboard

Unicode support

Open aafemt opened this issue 8 months ago • 12 comments

Currently the driver has two issues with Unicode functions:

  1. They are just front-end to ANSI functions that defeats their purpose: ability to work with symbols outside of ANSI codepage.
  2. A couple of bugs in ANSI<->Unicode conversion code assumes that SQLWCHAR is wchar_t and sizeof(wchar_t) is 2. The former makes the driver non-working with unixODBC and the latter - with iODBC.

I see following solutions for it:

  1. Drop charset option and fix the connection charset to NONE, performing conversion between storage charset and ANSI/Unicode on the client side. It will cause problems with server error messages and string literals in queries effectively limiting them to ASCII.
  2. Automatically set connection charset to match ANSI codepage (current locale) for SQL*Connect functions and fix it to UTF-8 for SQL*ConnectW functions. It will solve problems above but cause a new one with applications that use ANSI connection while still want to work with Unicode data using SQLGetData.
  3. Set connection charset to UTF-8 unconditionally and perform ANSI/Unicode conversions on the client side. No problems above but if the database charset is not UTF-8, a server-side conversion is enforced causing a higher load. Besides, it requires more changes in codebase.

So, no perfect solution. Or may be anyone see one?..

aafemt avatar Apr 07 '25 15:04 aafemt

Don't the xxxW functions expect UTF-16? If so, the whole doing things with UTF8 is a red herring.

What needs to be done, irrespective of the connection character set or (when using NONE) the individual column character sets is convert from that character set to UTF16. Columns that are explicitly NONE are - as always - extremely problematic.

mrotteveel avatar Apr 07 '25 17:04 mrotteveel

Don't the xxxW functions expect UTF-16?

Yes for Windows and unixODBC. No for iODBC. Unicode encoding in specification is not actually specified IIRC.

What needs to be done, irrespective of the connection character set or (when using NONE)

Data is not a problem in this case. Error messages and query text are.

aafemt avatar Apr 07 '25 19:04 aafemt

ODBC specifies UCS-2 (a.k.a UTF-16), see https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode?view=sql-server-ver16 ("Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character.")

mrotteveel avatar Apr 08 '25 05:04 mrotteveel

Also relevant: https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode-data?view=sql-server-ver16

mrotteveel avatar Apr 08 '25 05:04 mrotteveel

Right

вт, 8 апр. 2025 г., 08:43 Mark Rotteveel @.***>:

ODBC specifies UCS-2 (a.k.a UTF-16), see https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode?view=sql-server-ver16 ("Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character.")

— Reply to this email directly, view it on GitHub https://github.com/FirebirdSQL/firebird-odbc-driver/issues/244#issuecomment-2785286040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZQLQ2QIUCELXNXRAMGE32YNOZRAVCNFSM6AAAAAB2T7OAJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVGI4DMMBUGA . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: mrotteveel]mrotteveel left a comment (FirebirdSQL/firebird-odbc-driver#244) https://github.com/FirebirdSQL/firebird-odbc-driver/issues/244#issuecomment-2785286040

ODBC specifies UCS-2 (a.k.a UTF-16), see https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/unicode?view=sql-server-ver16 ("Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character.")

— Reply to this email directly, view it on GitHub https://github.com/FirebirdSQL/firebird-odbc-driver/issues/244#issuecomment-2785286040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZQLQ2QIUCELXNXRAMGE32YNOZRAVCNFSM6AAAAAB2T7OAJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVGI4DMMBUGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

irodushka avatar Apr 08 '25 05:04 irodushka

That makes the Unicode part of the driver completely broken on Linux where wchar_t is UCS4 aka UTF-32.

aafemt avatar Apr 08 '25 07:04 aafemt

Ok, if nobody has a better idea I'll try to fix the driver for awhile using hardcoded UTF-8 charset for SQL*ConnectW functions and performing conversions accordingly.

aafemt avatar Apr 09 '25 07:04 aafemt

Hi @aafemt

You're welcome to propose PR to fix this issue. I would appreciate it. If you face any difficulties I'll be glad to help you. Well, I was ready to dig it myself - but if want to contribute, I'll be happy to give way))

Regards

irodushka avatar Apr 10 '25 14:04 irodushka

What needs to be done, irrespective of the connection character set or (when using NONE) the individual column character sets is convert from that character set to UTF16. Columns that are explicitly NONE are - as always - extremely problematic.

@mrotteveel, would you mind pointing me to some test cases that demonstrate how to handle these scenarios correctly? It could be another repository, I just want to see the expected / edge cases.

fdcastel avatar Jun 01 '25 02:06 fdcastel

There is FBEncodingsTest and its subclass FBLongVarCharEncodingsTest in Jaybird, but to be honest, those are tests with 23 years of history, and they are not really good tests IMHO, but I've never gotten around to rewriting or replacing them.

mrotteveel avatar Jun 01 '25 10:06 mrotteveel

2. A couple of bugs in ANSI<->Unicode conversion code assumes that SQLWCHAR is wchar_t and sizeof(wchar_t) is 2. The former makes the driver non-working with unixODBC and the latter - with iODBC.

@aafemt I'm currently gathering a few edge cases for a future test suite. Would you be able to share some specific examples of this issue you're encountering?

fdcastel avatar Jun 02 '25 20:06 fdcastel

@fdcastel Well... The most obvious and simple example: iusql plainly cannot connect to Firebird ODBC sources providing empty error. Every Unicode function just doesn't work because of usage mbstowcs() and assumption that sizeof(wchar_t) is two.

aafemt avatar Jun 02 '25 21:06 aafemt