node-odbc
node-odbc copied to clipboard
Add feature to bind as SQLWCHAR
Windows is very UTF-16 centric while Node.js is very UTF-8 centric. While it can also handle UTF-16, it does not deal with other encodings well at all (at least in the NAPI addon). As a result, the package has been built as UNICODE since July 2020. This has fixed many encoding issues, since various strings are now passed back and forth using UTF-16 without issue. However, when binding, SQL_CHAR
, SQL_VARCHAR
, and SQL_CLOB
types are still bound as SQL_C_CHAR
, not SQL_C_WCHAR
. This works most of the time, but there have been many issues when databases are in the native Windows encoding (which outside of an experimental feature is never UTF-8). See #284 #290 and #291 for examples.
While some drivers have settings to use UTF-8 instead, that is driver-specific and not all do. It would be much better to either always bind as SQLWCHAR when UNICODE is defined (ie. Windows) or add a runtime option to enable it (enabled by default when UNICODE is?)
I added this code in odbc_connection.cpp on line 4176 instead of usual SQL_CHAR process:
// Napi::String (char) case SQL_CHAR : case SQL_VARCHAR : case SQL_LONGVARCHAR : default: // ******************* // Ajout DC // *******************
static const wchar_t CP1252_UNICODE_TABLE[] = L"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007" L"\u0008\u0009\u000A\u000B\u000C\u000D\u000E\u000F" L"\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017" L"\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F" L"\u0020\u0021\u0022\u0023\u0024\u0025\u0026\u0027" L"\u0028\u0029\u002A\u002B\u002C\u002D\u002E\u002F" L"\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" L"\u0038\u0039\u003A\u003B\u003C\u003D\u003E\u003F" L"\u0040\u0041\u0042\u0043\u0044\u0045\u0046\u0047" L"\u0048\u0049\u004A\u004B\u004C\u004D\u004E\u004F" L"\u0050\u0051\u0052\u0053\u0054\u0055\u0056\u0057" L"\u0058\u0059\u005A\u005B\u005C\u005D\u005E\u005F" L"\u0060\u0061\u0062\u0063\u0064\u0065\u0066\u0067" L"\u0068\u0069\u006A\u006B\u006C\u006D\u006E\u006F" L"\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" L"\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F" L"\u20AC\u0020\u201A\u0192\u201E\u2026\u2020\u2021" L"\u02C6\u2030\u0160\u2039\u0152\u0020\u017D\u0020" L"\u0020\u2018\u2019\u201C\u201D\u2022\u2013\u2014" L"\u02DC\u2122\u0161\u203A\u0153\u0020\u017E\u0178" L"\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" L"\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" L"\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" L"\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF" L"\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7" L"\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF" L"\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7" L"\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF" L"\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7" L"\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF" L"\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7" L"\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF";
int size = strlen((const char*)storedRow[j].char_data); std::wstring unicode(size, ' '); for (size_t i = 0; i < size; ++i){ unicode[i] = CP1252_UNICODE_TABLE[(uint8_t)storedRow[j].char_data[i]]; } std::wstring_convert<std::codecvt_utf8<wchar_t>> unicode_to_utf8; std::string unicodeencoded = unicode_to_utf8.to_bytes(unicode); SQLCHAR* utf8encoded = (SQLCHAR*)(const_cast<char*>(unicodeencoded.c_str())); // ******************* // Fin Ajout DC // *******************value = Napi::String::New(env, (const char*)utf8encoded, strlen((const char *)utf8encoded)); break;
Now it works fine !
Also, I changed the SQL_LONGVARBINARY process on line 3387 to be processed like a SQL_BINARY or a SQL_VARBINARY.
It works like a charm !
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi, This will be added in an update ? I need it for one of my project. Thank you.
This one too #242
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Related to others issues (#284 , #290 , #291 ), we encounter the same problems here in our project. @kadler have you got an idea when it's going to be implemented ?
Hello, do you need to recompile the code after change de .cpp? I did the changes but the result it's wrong, I think is probably need recompiled it, but I don't sure. and if is necesary to do it, how do you do it? thank you!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Not stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Not stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We use HFSQL database which uses native Windows encoding. Accented characters are lost. Big show stopper to work with NodeJS.
@CienappsDAG We encountered the same issues in our project, here are how we solve it.
You need to get all columns of the table and iterate over it on the data type like so:
switch (column.dataType) {
case -4:
case -1:
case 1:
case 12:
// Fixed issue when converting empty string
// HFSQl odbc driver doesn't allow null
// Converting to utf leads to null if string is empty
concatQuery.push(
`CASE WHEN TRIM(${column.name}) = '' THEN Char(32)+Char(32) ELSE CONVERT(${column.name
} using 'utf8') END AS ${column.name}`
);
break;
default:
concatQuery.push(column.name);
break;
}
Next, you need to decode the converted strings:
const enc = new TextDecoder("utf-8");
/**
* Remove null bytes from a string
* @param str - String to remove null bytes from
* @returns
*/
export function removeNullBytesFromString(str: string): string {
return str
.split("")
.filter((c) => c.charCodeAt(0))
.join("");
}
/**
* Format the object to remove null bytes and trim strings
* @param obj - Object to format
*/
export function format(obj: any) {
Object.keys(obj).forEach((key) => {
if (obj[key] instanceof ArrayBuffer) {
obj[key] = enc.decode(obj[key]);
}
if (typeof obj[key] === "string") {
obj[key] = removeNullBytesFromString(obj[key]).trim();
}
});
Hope this works. If you have any questions, don't hesitate to ask me.
I tried your solution but i encountered the same issue in another layer.
The issue is that I have some accented characters in table and column names (I know it's very bad, gotta make due).
HFSQL doesn't provide a table like Information_Schema to query directly.
Therefore, I encounter the same encoding issue when using the odbc.tables() or odbc.columns() functions.
My solution for now is to revert back to version 1 of odbc unfortunately.
Any news about this issue ?
Would be awesome if this could be added :)
I made a bruteforce fix in the PR #366