DBD-mysql icon indicating copy to clipboard operation
DBD-mysql copied to clipboard

mysql_enable_utf8 misses utf8 upgrading of non utf8 strings [rt.cpan.org #25590]

Open mbeijen opened this issue 8 years ago • 0 comments
trafficstars

Migrated from rt.cpan.org#25590 (status was 'open')

Requestors:

Attachments:

From [email protected] on 2007-03-21 11:17:54:

When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8
strings. Instead it assumes that all data passed to DBD::mysql is valid
utf8 already. Non ASCII characters are simply removed from strings
passed to DBD::mysql.

Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release)
which adds a correspondent regression test.

With this patch test #19 fails with the following message:

  not ok 19 at line 185
  got back 'umlauts: ' instead of 'umlauts: äüö'.

I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27
client libraries.

From [email protected] on 2007-03-21 12:42:14:

Joern,

Thank you very much for this patch. One issue though, I had applied
another patch to this test from Joost Diepenmaat that might address what
your patch intends to address. Could you please check out the latest
DBD::mysql and verify if you still need to make a patch against that
test? If you would like, I can even give you commit access to the
repository.

I'm hoping to do a release either later today or tomorrow, but will wait
upon your response before I do.

Kind regards,

Patrick


On Wed Mar 21 07:17:54 2007, JRED wrote:
> 
> When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8
> strings. Instead it assumes that all data passed to DBD::mysql is valid
> utf8 already. Non ASCII characters are simply removed from strings
> passed to DBD::mysql.
> 
> Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release)
> which adds a correspondent regression test.
> 
> With this patch test #19 fails with the following message:
> 
>   not ok 19 at line 185
>   got back 'umlauts: ' instead of 'umlauts: äüö'.
> 
> I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27
> client libraries.


From [email protected] on 2007-03-21 20:43:05:

On Mi. 21. Mär. 2007, 08:42:14, CAPTTOFU wrote:

> Thank you very much for this patch. One issue though, I had applied
> another patch to this test from Joost Diepenmaat that might address what
> your patch intends to address. Could you please check out the latest
> DBD::mysql and verify if you still need to make a patch against that
> test? If you would like, I can even give you commit access to the
> repository.

Thanks for the quick reply.

I just checked out svn trunk and my patched utf8.t still fails. As far
as I understand the Changes file correctly, Joost's patch addresses a
missing utf8 flag when selecting utf8_bin columns from the database.

My issue is that DBD::mysql passes all data as-is to the database even
when the connection is in utf8 mode. This way all non ASCII characters
of non-utf8-tagged strings gets lost in the database. But passing
non-utf8-tagged strings to DBD::mysql should be absolutely valid, since
they're valid for Perl they should be valid for DBD::mysql as well ;)

A fix would be to utf8::upgrade all binded parameters and the SQL
statement itself before sending them to the server when
mysql_enable_utf8 is active.

Regards,

Jörn

From [email protected] on 2007-03-22 18:11:50:

On Mi. 21. Mär. 2007, 16:43:05, JRED wrote:

> A fix would be to utf8::upgrade all binded parameters and the SQL
> statement itself before sending them to the server when
> mysql_enable_utf8 is active.

More exactly: of course not all strings need to be upgraded, just those
targeted to character columns. In particular blobs wouldn't survive an
utf8::upgrade ;)

If you're interested I can provide more test scripts for these issues.

Regards,

Jörn

From [email protected] on 2007-05-27 01:56:33:

Hi,

This is on my radar, but I admit, I'm not a guru on UTF8 - probably an
American issue. I have recently started to have to deal with UTF8 a bit
on grazr, particularly with Chinese feeds, and now understand some of
the issues with UTF8. What does utf8::upgrade do exactly?

I would like to fix this issue!

On Thu Mar 22 14:11:50 2007, JRED wrote:
> On Mi. 21. Mär. 2007, 16:43:05, JRED wrote:
> 
> > A fix would be to utf8::upgrade all binded parameters and the SQL
> > statement itself before sending them to the server when
> > mysql_enable_utf8 is active.
> 
> More exactly: of course not all strings need to be upgraded, just those
> targeted to character columns. In particular blobs wouldn't survive an
> utf8::upgrade ;)
> 
> If you're interested I can provide more test scripts for these issues.
> 
> Regards,
> 
> Jörn


From [email protected] on 2007-05-27 09:24:09:

On Sat May 26 21:56:33 2007, CAPTTOFU wrote:

> What does utf8::upgrade do exactly?

When the variable has the utf-8 flag set, it does nothing, because it
assumes that this variables carries valid utf-8 already.

If the utf-8 flag isn't set, the variable is _converted_ to utf-8 and
the utf-8 flag is set as well.

The latter case is interesting in conjunction with DBD::mysql when the
database connection is utf-8, but the application passes non utf-8 data.
Then the data has to be "upgraded" to utf-8 and must not be passed
as-is, because MySQL then receives illegal data which does not conform
to utf-8.

So all parameters binded to SQL statements (and the SQL statements
themself) need to go through a utf8::upgrade() and everything will work
as expected.

We do this in a database layer we wrote for our applications, but we
want to get rid of this layer ;) Of course it would be better if this is
handled on DBI/DBD level, not on application level. I dunno the DBI/DBD
architecture well, but probably this could be done even at DBI level, so
all drivers would benefit.

Regards,

Jörn

From [email protected] on 2007-05-30 14:01:25:

Sorry for jumping in, but...

On Sun May 27 05:24:09 2007, JRED wrote:
> So all parameters binded to SQL statements (and the SQL statements
> themself) need to go through a utf8::upgrade() and everything will work
> as expected.

Note that any bounded binary columns/values must NOT be upgraded. This
means you need to know for each bounded parameter if it's a text value
or binary. I've looked at this issue myself, but I couldn't figure out
how to do it. My experience with dbd::mysql & libmysqlclient is pretty
limited though.

Also note that it's NOT enough to check the sql type of the column. See
also RT #24738 (fixed that for selecting binary & utf-8 values)

From [email protected] on 2016-10-22 15:07:53:

Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67
I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...

From [email protected] on 2017-07-01 09:15:52:

Reopening, fix was reverted in 4.043.

mbeijen avatar Nov 14 '17 19:11 mbeijen