DBD-mysql
DBD-mysql copied to clipboard
mysql_enable_utf8 misses utf8 upgrading of non utf8 strings [rt.cpan.org #25590]
trafficstars
Migrated from rt.cpan.org#25590 (status was 'open')
Requestors:
Attachments:
From [email protected] on 2007-03-21 11:17:54:
When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8
strings. Instead it assumes that all data passed to DBD::mysql is valid
utf8 already. Non ASCII characters are simply removed from strings
passed to DBD::mysql.
Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release)
which adds a correspondent regression test.
With this patch test #19 fails with the following message:
not ok 19 at line 185
got back 'umlauts: ' instead of 'umlauts: äüö'.
I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27
client libraries.
From [email protected] on 2007-03-21 12:42:14:
Joern,
Thank you very much for this patch. One issue though, I had applied
another patch to this test from Joost Diepenmaat that might address what
your patch intends to address. Could you please check out the latest
DBD::mysql and verify if you still need to make a patch against that
test? If you would like, I can even give you commit access to the
repository.
I'm hoping to do a release either later today or tomorrow, but will wait
upon your response before I do.
Kind regards,
Patrick
On Wed Mar 21 07:17:54 2007, JRED wrote:
>
> When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8
> strings. Instead it assumes that all data passed to DBD::mysql is valid
> utf8 already. Non ASCII characters are simply removed from strings
> passed to DBD::mysql.
>
> Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release)
> which adds a correspondent regression test.
>
> With this patch test #19 fails with the following message:
>
> not ok 19 at line 185
> got back 'umlauts: ' instead of 'umlauts: äüö'.
>
> I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27
> client libraries.
From [email protected] on 2007-03-21 20:43:05:
On Mi. 21. Mär. 2007, 08:42:14, CAPTTOFU wrote:
> Thank you very much for this patch. One issue though, I had applied
> another patch to this test from Joost Diepenmaat that might address what
> your patch intends to address. Could you please check out the latest
> DBD::mysql and verify if you still need to make a patch against that
> test? If you would like, I can even give you commit access to the
> repository.
Thanks for the quick reply.
I just checked out svn trunk and my patched utf8.t still fails. As far
as I understand the Changes file correctly, Joost's patch addresses a
missing utf8 flag when selecting utf8_bin columns from the database.
My issue is that DBD::mysql passes all data as-is to the database even
when the connection is in utf8 mode. This way all non ASCII characters
of non-utf8-tagged strings gets lost in the database. But passing
non-utf8-tagged strings to DBD::mysql should be absolutely valid, since
they're valid for Perl they should be valid for DBD::mysql as well ;)
A fix would be to utf8::upgrade all binded parameters and the SQL
statement itself before sending them to the server when
mysql_enable_utf8 is active.
Regards,
Jörn
From [email protected] on 2007-03-22 18:11:50:
On Mi. 21. Mär. 2007, 16:43:05, JRED wrote:
> A fix would be to utf8::upgrade all binded parameters and the SQL
> statement itself before sending them to the server when
> mysql_enable_utf8 is active.
More exactly: of course not all strings need to be upgraded, just those
targeted to character columns. In particular blobs wouldn't survive an
utf8::upgrade ;)
If you're interested I can provide more test scripts for these issues.
Regards,
Jörn
From [email protected] on 2007-05-27 01:56:33:
Hi,
This is on my radar, but I admit, I'm not a guru on UTF8 - probably an
American issue. I have recently started to have to deal with UTF8 a bit
on grazr, particularly with Chinese feeds, and now understand some of
the issues with UTF8. What does utf8::upgrade do exactly?
I would like to fix this issue!
On Thu Mar 22 14:11:50 2007, JRED wrote:
> On Mi. 21. Mär. 2007, 16:43:05, JRED wrote:
>
> > A fix would be to utf8::upgrade all binded parameters and the SQL
> > statement itself before sending them to the server when
> > mysql_enable_utf8 is active.
>
> More exactly: of course not all strings need to be upgraded, just those
> targeted to character columns. In particular blobs wouldn't survive an
> utf8::upgrade ;)
>
> If you're interested I can provide more test scripts for these issues.
>
> Regards,
>
> Jörn
From [email protected] on 2007-05-27 09:24:09:
On Sat May 26 21:56:33 2007, CAPTTOFU wrote:
> What does utf8::upgrade do exactly?
When the variable has the utf-8 flag set, it does nothing, because it
assumes that this variables carries valid utf-8 already.
If the utf-8 flag isn't set, the variable is _converted_ to utf-8 and
the utf-8 flag is set as well.
The latter case is interesting in conjunction with DBD::mysql when the
database connection is utf-8, but the application passes non utf-8 data.
Then the data has to be "upgraded" to utf-8 and must not be passed
as-is, because MySQL then receives illegal data which does not conform
to utf-8.
So all parameters binded to SQL statements (and the SQL statements
themself) need to go through a utf8::upgrade() and everything will work
as expected.
We do this in a database layer we wrote for our applications, but we
want to get rid of this layer ;) Of course it would be better if this is
handled on DBI/DBD level, not on application level. I dunno the DBI/DBD
architecture well, but probably this could be done even at DBI level, so
all drivers would benefit.
Regards,
Jörn
From [email protected] on 2007-05-30 14:01:25:
Sorry for jumping in, but...
On Sun May 27 05:24:09 2007, JRED wrote:
> So all parameters binded to SQL statements (and the SQL statements
> themself) need to go through a utf8::upgrade() and everything will work
> as expected.
Note that any bounded binary columns/values must NOT be upgraded. This
means you need to know for each bounded parameter if it's a text value
or binary. I've looked at this issue myself, but I couldn't figure out
how to do it. My experience with dbd::mysql & libmysqlclient is pretty
limited though.
Also note that it's NOT enough to check the sql type of the column. See
also RT #24738 (fixed that for selecting binary & utf-8 values)
From [email protected] on 2016-10-22 15:07:53:
Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67
I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...
From [email protected] on 2017-07-01 09:15:52:
Reopening, fix was reverted in 4.043.