node-odbc icon indicating copy to clipboard operation
node-odbc copied to clipboard

Add feature to bind as SQLWCHAR

Open kadler opened this issue 2 years ago • 20 comments

Windows is very UTF-16 centric while Node.js is very UTF-8 centric. While it can also handle UTF-16, it does not deal with other encodings well at all (at least in the NAPI addon). As a result, the package has been built as UNICODE since July 2020. This has fixed many encoding issues, since various strings are now passed back and forth using UTF-16 without issue. However, when binding, SQL_CHAR, SQL_VARCHAR, and SQL_CLOB types are still bound as SQL_C_CHAR, not SQL_C_WCHAR. This works most of the time, but there have been many issues when databases are in the native Windows encoding (which outside of an experimental feature is never UTF-8). See #284 #290 and #291 for examples.

While some drivers have settings to use UTF-8 instead, that is driver-specific and not all do. It would be much better to either always bind as SQLWCHAR when UNICODE is defined (ie. Windows) or add a runtime option to enable it (enabled by default when UNICODE is?)

kadler avatar Oct 03 '22 23:10 kadler

I added this code in odbc_connection.cpp on line 4176 instead of usual SQL_CHAR process:

// Napi::String (char) case SQL_CHAR : case SQL_VARCHAR : case SQL_LONGVARCHAR : default: // ******************* // Ajout DC // *******************
static const wchar_t CP1252_UNICODE_TABLE[] = L"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007" L"\u0008\u0009\u000A\u000B\u000C\u000D\u000E\u000F" L"\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017" L"\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F" L"\u0020\u0021\u0022\u0023\u0024\u0025\u0026\u0027" L"\u0028\u0029\u002A\u002B\u002C\u002D\u002E\u002F" L"\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" L"\u0038\u0039\u003A\u003B\u003C\u003D\u003E\u003F" L"\u0040\u0041\u0042\u0043\u0044\u0045\u0046\u0047" L"\u0048\u0049\u004A\u004B\u004C\u004D\u004E\u004F" L"\u0050\u0051\u0052\u0053\u0054\u0055\u0056\u0057" L"\u0058\u0059\u005A\u005B\u005C\u005D\u005E\u005F" L"\u0060\u0061\u0062\u0063\u0064\u0065\u0066\u0067" L"\u0068\u0069\u006A\u006B\u006C\u006D\u006E\u006F" L"\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" L"\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F" L"\u20AC\u0020\u201A\u0192\u201E\u2026\u2020\u2021" L"\u02C6\u2030\u0160\u2039\u0152\u0020\u017D\u0020" L"\u0020\u2018\u2019\u201C\u201D\u2022\u2013\u2014" L"\u02DC\u2122\u0161\u203A\u0153\u0020\u017E\u0178" L"\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" L"\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" L"\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" L"\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF" L"\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7" L"\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF" L"\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7" L"\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF" L"\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7" L"\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF" L"\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7" L"\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF";
int size = strlen((const char*)storedRow[j].char_data); std::wstring unicode(size, ' '); for (size_t i = 0; i < size; ++i){ unicode[i] = CP1252_UNICODE_TABLE[(uint8_t)storedRow[j].char_data[i]]; } std::wstring_convert<std::codecvt_utf8<wchar_t>> unicode_to_utf8; std::string unicodeencoded = unicode_to_utf8.to_bytes(unicode); SQLCHAR* utf8encoded = (SQLCHAR*)(const_cast<char*>(unicodeencoded.c_str())); // ******************* // Fin Ajout DC // *******************

value = Napi::String::New(env, (const char*)utf8encoded, strlen((const char *)utf8encoded)); break;

Now it works fine !

Also, I changed the SQL_LONGVARBINARY process on line 3387 to be processed like a SQL_BINARY or a SQL_VARBINARY.

It works like a charm !

fybo avatar Oct 17 '22 06:10 fybo

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 16 '22 07:11 stale[bot]

Hi, This will be added in an update ? I need it for one of my project. Thank you.

geofmigliacci avatar Nov 17 '22 09:11 geofmigliacci

This one too #242

luizhrios avatar Dec 07 '22 01:12 luizhrios

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 06 '23 03:01 stale[bot]

Related to others issues (#284 , #290 , #291 ), we encounter the same problems here in our project. @kadler have you got an idea when it's going to be implemented ?

TheCryptos avatar Feb 08 '23 14:02 TheCryptos

Hello, do you need to recompile the code after change de .cpp? I did the changes but the result it's wrong, I think is probably need recompiled it, but I don't sure. and if is necesary to do it, how do you do it? thank you!

SoporteMurgiverde avatar Feb 24 '23 17:02 SoporteMurgiverde

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 26 '23 20:03 stale[bot]

Not stale

markdirish avatar Mar 27 '23 00:03 markdirish

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '23 10:04 stale[bot]

Not stale

markdirish avatar May 23 '23 16:05 markdirish

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 22 '23 17:06 stale[bot]

We use HFSQL database which uses native Windows encoding. Accented characters are lost. Big show stopper to work with NodeJS.

CienappsDAG avatar Jul 03 '23 19:07 CienappsDAG

@CienappsDAG We encountered the same issues in our project, here are how we solve it.

You need to get all columns of the table and iterate over it on the data type like so:

switch (column.dataType) {
  case -4:
  case -1:
  case 1:
  case 12:
    // Fixed issue when converting empty string
    // HFSQl odbc driver doesn't allow null
    // Converting to utf leads to null if string is empty

    concatQuery.push(
      `CASE WHEN TRIM(${column.name}) = '' THEN Char(32)+Char(32) ELSE CONVERT(${column.name
      } using 'utf8') END AS ${column.name}`
    );
    break;
  default:
    concatQuery.push(column.name);
    break;
}

Next, you need to decode the converted strings:

const enc = new TextDecoder("utf-8");

/**
 * Remove null bytes from a string
 * @param str - String to remove null bytes from
 * @returns
 */
export function removeNullBytesFromString(str: string): string {
  return str
    .split("")
    .filter((c) => c.charCodeAt(0))
    .join("");
}

/**
 * Format the object to remove null bytes and trim strings
 * @param obj - Object to format
 */
export function format(obj: any) {
  Object.keys(obj).forEach((key) => {
    if (obj[key] instanceof ArrayBuffer) {
      obj[key] = enc.decode(obj[key]);
    }
    if (typeof obj[key] === "string") {
      obj[key] = removeNullBytesFromString(obj[key]).trim();
    }
  });

Hope this works. If you have any questions, don't hesitate to ask me.

TheCryptos avatar Jul 04 '23 07:07 TheCryptos

I tried your solution but i encountered the same issue in another layer.

The issue is that I have some accented characters in table and column names (I know it's very bad, gotta make due).

HFSQL doesn't provide a table like Information_Schema to query directly.

Therefore, I encounter the same encoding issue when using the odbc.tables() or odbc.columns() functions.

CienappsDAG avatar Jul 04 '23 17:07 CienappsDAG

My solution for now is to revert back to version 1 of odbc unfortunately.

CienappsDAG avatar Jul 04 '23 17:07 CienappsDAG

Any news about this issue ?

louia avatar Sep 27 '23 07:09 louia

Would be awesome if this could be added :)

Relbot avatar Nov 02 '23 12:11 Relbot

I made a bruteforce fix in the PR #366

JonathanCabezas avatar Feb 02 '24 11:02 JonathanCabezas