DBD-mysql icon indicating copy to clipboard operation
DBD-mysql copied to clipboard

mysql_enable_utf8 => 1 does not encode to utf8 when using some special characters like 'á' or 'ú'

Open Motoko23 opened this issue 1 year ago • 6 comments
trafficstars

DBD::mysql version

5.1.0

MySQL client version

8.0.32

Server version

8.0.34

Operating system version

Linux Gentoo (kernel 6.6.8)

What happened?

DBD should encode texts into UTF-8 with mysql_enable_utf8 => 1 flag.

It works only:

  1. When using DBD MySQL, and there are no diacritics at all.
  2. When using DBD MySQL, and there are more complex special characters like „ž“, „š“ (characters with hooks, or a combination of more complex characters).
  3. When using DBD MySQL, and force encoding every SQL, and every execute @parameters to UTF-8 just before pass to DBD MySQL. But thats wrong.
  4. When using DBD MariaDB always. MariaDBD is compatible and is 100% working without this bug. So i am now using MariaDBD on MySQL Database.

It is not working:

  1. When using DBD MySQL, when there is only some characters, for example "á" in word „Informátor“. It simply ignores "á" and does not encode it. In database SQL „show processlist“ is „Inform?tor“. So INSERT save word „Inform?tor“. And when search in SELECT WHERE its search for „Inform?tor“. Thats wrong.

Other information

No response

Motoko23 avatar Jan 25 '24 07:01 Motoko23

Ever tried doing

SET NAMES utf8mb4;

at the beginning of your session?

jafd avatar Jan 29 '24 15:01 jafd

Ever tried doing

SET NAMES utf8mb4;

at the beginning of your session?

YES. I have tried also:

SET NAMES utf8mb4;
set character set utf8mb4;

or

SET NAMES utf8;
set character set utf8;

I must repeat, that it should work for ALL characters or NONE. Its working for SOME.

Motoko23 avatar Jan 29 '24 15:01 Motoko23

Is it, by chance, working with characters that are historically in latin-1 but not in whatever 8-bit codepage used to serve your language (looks like Slovak or Czech, so iso8859-2 or cp1252)? Or vice versa? The thing is that your code should be UTF8 across the board, and if the unicode is in your perl source, you need to use utf8; somewhere at the top too.

jafd avatar Jan 29 '24 15:01 jafd

Everything is in UTF-8 on input or output. Everything is in UNICODE inside script.

I am using:

use utf8;                                          # this script text is in UTF-8 (auto decode)
use feature 'unicode_strings';      # this script use unicode in regexp (auto decode)

in every file.

I am using utf8::decode($query); for every input.

i am using utf8::encode($page); just before final print.

Also everything is woking fine with DBD::MariaDB.

I am mostly using this source: https://perldoc.perl.org/perlunifaq

And thank You for Your help Yaroslav. I will be using only Maria Database in future, so i will not use MySQL, but i have reported this bug to warn and help others.

Motoko23 avatar Jan 29 '24 15:01 Motoko23

It's interesting because I haven't run into this and I'm using Unicode a lot. Granted, I'm still using DBD::mysql 4.050, so there might have been a regression or other, because I can see that barring some system setting on the way, you haven't left a stone unturned here (system locale in both client and server? The character set declared on the table columns themselves maybe?).

Sorry that I had to ask all of this stuff, but since I've had my own problems with Unicode, I just know how hard it can be to make sure all ducks are really in a row and how one stupid setting can ruin a week.

jafd avatar Jan 29 '24 18:01 jafd

This is likely an instance of the Unicode bug present in this distribution and why you should use DBD::MariaDB, see https://github.com/perl5-dbi/DBD-mysql/issues?q=is%3Aissue+is%3Aopen+label%3Autf8 and https://blogs.perl.org/users/grinnz/2023/12/migrating-from-dbdmysql-to-dbdmariadb.html. The workaround if so would be to call utf8::upgrade on any unicode strings immediately before passing them as parameters (it operates on the string in-place, and leaves the string unchanged to Perl but different to broken interpretations like DBD::mysql).

Grinnz avatar Jan 29 '24 18:01 Grinnz