DBD-mysql icon indicating copy to clipboard operation
DBD-mysql copied to clipboard

UTF8 strings have no utf8 flag set [rt.cpan.org #53130]

Open mbeijen opened this issue 8 years ago • 0 comments
trafficstars

Migrated from rt.cpan.org#53130 (status was 'open')

Requestors:

Attachments:

From [email protected] on 2009-12-27 23:05:00:

I can't clearly figure out if is dupe, seems like it is not. Others were not
touched for years already.
Unicode strings has no utf8 flag set. So strings are encoded in utf8, but there
is no utf8 flag.
I've created a unicode db and a table in it, see dbinit.sql for mo detail.
Repro script in attach.

All strings are valid utf8 strings but without utf flag. These strings go to
output incorrectly unless I update it myself.

Note: all utf8 flags set for outputs, etc...

perl version: v5.10.1, build 1006 [291086]
DBD-mysql: 4.011
DBI: 1.609
OS: Windows 7 32bit [Version 6.1.7600]
MySQL: 5.1.37 win32 on localhost

From http://peter.vereshagin.org/ on 2011-01-27 14:49:28

I have another kind of trouble here.
WHat do you think about the patch I supply here?
I was trying your repro.pl to know out if that is your trouble, too:
http://lists.mysql.com/perl/4382
But it looks like not.

From [email protected] on 2011-07-16 04:09:58:

Yes this is a bug. Mistakenly reported it to MSSQL before. It does not get 
the same values it writes. Here is test case:

use DBI;
use Data::Dumper;
use strict;
use utf8;
binmode STDOUT, ":utf8";
my $h = DBI->connect('dbi:mysql:database=xxx;host=server99', 'xxx', 
'xxxxx') or die("Cannot connect to MySQL database: ", $DBI::errstr);
$h->do('SET NAMES utf8');
eval {
	$h->do(q/drop table mje/);
};
$h->do(q/create table mje (a nvarchar(20))/);
my $unicode = "\x{e9} é \x{20ac}";
print $unicode, ', ', utf8::is_utf8($unicode), ', ', Dumper($unicode);
$h->do(q/insert into mje values(?)/, undef, $unicode);
my $s = $h->prepare(q/select * from mje/);
$s->execute;
my $f = $s->fetchall_arrayref;
my $x = $f->[0]->[0];
# utf8::decode($x);
print $x, ', ', utf8::is_utf8($x), ', ', (map { sprintf('%02X ', ord($_)) } 
split (//, $x)), ', ', Dumper($f), "\n";
exit;

----------------------- This is the output

é é €, 1, $VAR1 = "\x{e9} \x{e9} \x{20ac}";
é é €, , C3 A9 20 C3 A9 20 E2 82 AC , $VAR1 = [
          [
            'é é €'
          ]
        ];
------------------------

You must forgive the line drawing characters because it was using 
ActiveState Perl on Windows, which cannot print even hardcoded Unicode to 
the console, nor can Strawbery Perl. Only Cygwin Perl seems to display 
Unicode properly (but I cannot get DBD::MySQL to compile with Cygwin).

The important point to notice is that '1' value, indicating is_utf(), and 
the hexadecimal representation of é, which is E9 in Unicode, and C3 A9 in 
UTF8. The first Dumper output is correct. 

On the 2nd line, notice is_utf() does not return a value of '1', and that 
the hexadecimal value of the string is broken down into a UTF8 byte stream. 
It was not decoded properly. If I use utf8::decode() on the value returned 
from the database, then it works ok. Without the UTF8 decoding, Perl is 
assuming the multiple bytes C3 A9 which represents é and should be combined 
into Unicode E9 are separate Unicode characters C3 and another character 
A9, which is completely not what was expected. 

Here is additional information

--------------------------

(Terminal is set to UTF8 character set.)

mysql> select a,hex(a) from mje;
+-----------+--------------------+
| a         | hex(a)             |
+-----------+--------------------+
| é é € | C3A920C3A920E282AC |
+-----------+--------------------+

mysql> status
--------------
mysql  Ver 14.12 Distrib 5.0.60, for pc-linux-gnu (i686) using readline 5.2

Server version:         5.0.60-log Gentoo Linux mysql-5.0.60-r1
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8


List of Unicode characters and their UTF8 hex values
http://www.utf8-chartable.de/

From [email protected] on 2011-07-16 04:23:11:

Additional information:

C:\Users\xxxxxxx\Documents\xxxx-serverscripts>perl -MDBI -e "DBI-
>installed_versions"
  Perl            : 5.010001    (MSWin32-x64-multi-thread)
  OS              : MSWin32     (5.2)
  DBI             : 1.615
  DBD::mysql      : 4.018


Similar but misfiled bug:
https://rt.cpan.org/Public/Bug/Display.html?id=69362

From [email protected] on 2016-10-22 15:14:51:

Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67
I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...

From [email protected] on 2017-07-01 09:16:53:

Reopening, fix was reverted in 4.043.

mbeijen avatar Nov 15 '17 07:11 mbeijen