DBD-mysql
DBD-mysql copied to clipboard
UTF8 strings have no utf8 flag set [rt.cpan.org #53130]
trafficstars
Migrated from rt.cpan.org#53130 (status was 'open')
Requestors:
Attachments:
From [email protected] on 2009-12-27 23:05:00:
I can't clearly figure out if is dupe, seems like it is not. Others were not
touched for years already.
Unicode strings has no utf8 flag set. So strings are encoded in utf8, but there
is no utf8 flag.
I've created a unicode db and a table in it, see dbinit.sql for mo detail.
Repro script in attach.
All strings are valid utf8 strings but without utf flag. These strings go to
output incorrectly unless I update it myself.
Note: all utf8 flags set for outputs, etc...
perl version: v5.10.1, build 1006 [291086]
DBD-mysql: 4.011
DBI: 1.609
OS: Windows 7 32bit [Version 6.1.7600]
MySQL: 5.1.37 win32 on localhost
From http://peter.vereshagin.org/ on 2011-01-27 14:49:28
I have another kind of trouble here.
WHat do you think about the patch I supply here?
I was trying your repro.pl to know out if that is your trouble, too:
http://lists.mysql.com/perl/4382
But it looks like not.
From [email protected] on 2011-07-16 04:09:58:
Yes this is a bug. Mistakenly reported it to MSSQL before. It does not get
the same values it writes. Here is test case:
use DBI;
use Data::Dumper;
use strict;
use utf8;
binmode STDOUT, ":utf8";
my $h = DBI->connect('dbi:mysql:database=xxx;host=server99', 'xxx',
'xxxxx') or die("Cannot connect to MySQL database: ", $DBI::errstr);
$h->do('SET NAMES utf8');
eval {
$h->do(q/drop table mje/);
};
$h->do(q/create table mje (a nvarchar(20))/);
my $unicode = "\x{e9} é \x{20ac}";
print $unicode, ', ', utf8::is_utf8($unicode), ', ', Dumper($unicode);
$h->do(q/insert into mje values(?)/, undef, $unicode);
my $s = $h->prepare(q/select * from mje/);
$s->execute;
my $f = $s->fetchall_arrayref;
my $x = $f->[0]->[0];
# utf8::decode($x);
print $x, ', ', utf8::is_utf8($x), ', ', (map { sprintf('%02X ', ord($_)) }
split (//, $x)), ', ', Dumper($f), "\n";
exit;
----------------------- This is the output
ââ ââ Îé¼, 1, $VAR1 = "\x{e9} \x{e9} \x{20ac}";
âââ¬â âââ¬â âóâ¬Ã©â¬Â¼, , C3 A9 20 C3 A9 20 E2 82 AC , $VAR1 = [
[
'âââ¬â âââ¬â âóâ¬Ã©â¬Â¼'
]
];
------------------------
You must forgive the line drawing characters because it was using
ActiveState Perl on Windows, which cannot print even hardcoded Unicode to
the console, nor can Strawbery Perl. Only Cygwin Perl seems to display
Unicode properly (but I cannot get DBD::MySQL to compile with Cygwin).
The important point to notice is that '1' value, indicating is_utf(), and
the hexadecimal representation of é, which is E9 in Unicode, and C3 A9 in
UTF8. The first Dumper output is correct.
On the 2nd line, notice is_utf() does not return a value of '1', and that
the hexadecimal value of the string is broken down into a UTF8 byte stream.
It was not decoded properly. If I use utf8::decode() on the value returned
from the database, then it works ok. Without the UTF8 decoding, Perl is
assuming the multiple bytes C3 A9 which represents é and should be combined
into Unicode E9 are separate Unicode characters C3 and another character
A9, which is completely not what was expected.
Here is additional information
--------------------------
(Terminal is set to UTF8 character set.)
mysql> select a,hex(a) from mje;
+-----------+--------------------+
| a | hex(a) |
+-----------+--------------------+
| é é ⬠| C3A920C3A920E282AC |
+-----------+--------------------+
mysql> status
--------------
mysql Ver 14.12 Distrib 5.0.60, for pc-linux-gnu (i686) using readline 5.2
Server version: 5.0.60-log Gentoo Linux mysql-5.0.60-r1
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
List of Unicode characters and their UTF8 hex values
http://www.utf8-chartable.de/
From [email protected] on 2011-07-16 04:23:11:
Additional information:
C:\Users\xxxxxxx\Documents\xxxx-serverscripts>perl -MDBI -e "DBI-
>installed_versions"
Perl : 5.010001 (MSWin32-x64-multi-thread)
OS : MSWin32 (5.2)
DBI : 1.615
DBD::mysql : 4.018
Similar but misfiled bug:
https://rt.cpan.org/Public/Bug/Display.html?id=69362
From [email protected] on 2016-10-22 15:14:51:
Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67
I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...
From [email protected] on 2017-07-01 09:16:53:
Reopening, fix was reverted in 4.043.