MySqlConnector
MySqlConnector copied to clipboard
Some of the "sjis" characters are not returned correctly
Software versions MySQL 5.7.34, MySQL 8.0.22
Describe the bug Some of the CHARSET=sjis characters (Japanese) are returned as question marks when they are read by MySqlConnector. Exactly the same code works fine with the official MySql.Data.MySqlClient connector.
Exception n/a
Code sample
The following app prints this when it is compiled with the official MySQL connector (using MySql.Data.MySqlClient namespace) as expected:
id = 1, name = 鬮?(first char = 0x9ad9), expected 鬮? is expected = True
Done
but the 鬮?character gets replaced with question mark when using MySqlConnector namespace:
id = 1, name = ? (first char = 0x3f), expected 鬮? is expected = False
Done
static void Main(string[] args)
{
Console.OutputEncoding = Encoding.UTF8;
Encoding cp932 = CodePagesEncodingProvider.Instance.GetEncoding(932) ?? throw new Exception("Failed to create cp932 encoding");
string expected = cp932.GetString(new byte[] { 0xfb, 0xfc });
using var connection = new MySqlConnection("Server=localhost;Password=***;User ID=root;");
connection.Open();
using var command = connection.CreateCommand();
command.CommandText = "select id, name from a.test";
using var dr = command.ExecuteReader();
while (dr.Read())
{
Console.WriteLine(
"id = {0}, name = {1} (first char = 0x{2:x}), expected {3}, is expected = {4}",
dr[0],
dr[1],
(uint)((string)dr[1])[0],
expected,
expected == (string)dr[1]);
}
Console.WriteLine("Done");
}
Expected behavior A U+9AD9 character (鬮? is expected to be returned, however we get the regular question mark placeholder U+003F (?)
Additional context MySqlConnector seems to configure the connection to receive all content in Unicode, and there must be a bug in MySQL because it sends the question mark in this case. I can see the 0x3f character in the buffer under the debugger.
However, the official connector receives the 0xFB, 0xFC bytes, and it converts them to the correct 鬮?character. The same behavior can be achieved by setting the character_set_results variable as
command.CommandText = "set character_set_results=sjis";
command.ExecuteNonQuery();
but then we still get the question mark in the end because we try to read it as UTF8 string and this sequence does not encode a UTF8 character: https://github.com/mysql-net/MySqlConnector/blob/bbdbd782e7434b765154805b1cb61d8daac68112/src/MySqlConnector/ColumnReaders/StringColumnReader.cs#L12
This is how I created the test data:
C:\exe\mysql-5.7.34-winx64\bin
$ chcp 932
Active code page: 932
C:\exe\mysql-5.7.34-winx64\bin
$ mysql -h localhost -u root -p
. . . . . . .
mysql> create table test_sjis(id int primary key, name varchar(99)) default charset=sjis;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into test_sjis values(1, 0xfbfc);
Query OK, 1 row affected (0.00 sec)
mysql> select id, name, hex(name) from test_sjis;
+----+------+-----------+
| id | name | hex(name) |
+----+------+-----------+
| 1 | 鬮? | FBFC |
+----+------+-----------+
1 row in set (0.00 sec)
mysql>