DBD-mysql
DBD-mysql copied to clipboard
mysql_enable_utf8 => 1 does not encode to utf8 when using some special characters like 'á' or 'ú'
DBD::mysql version
5.1.0
MySQL client version
8.0.32
Server version
8.0.34
Operating system version
Linux Gentoo (kernel 6.6.8)
What happened?
DBD should encode texts into UTF-8 with mysql_enable_utf8 => 1 flag.
It works only:
- When using DBD MySQL, and there are no diacritics at all.
- When using DBD MySQL, and there are more complex special characters like „ž“, „š“ (characters with hooks, or a combination of more complex characters).
- When using DBD MySQL, and force encoding every SQL, and every execute @parameters to UTF-8 just before pass to DBD MySQL. But thats wrong.
- When using DBD MariaDB always. MariaDBD is compatible and is 100% working without this bug. So i am now using MariaDBD on MySQL Database.
It is not working:
- When using DBD MySQL, when there is only some characters, for example "á" in word „Informátor“. It simply ignores "á" and does not encode it. In database SQL „show processlist“ is „Inform?tor“. So INSERT save word „Inform?tor“. And when search in SELECT WHERE its search for „Inform?tor“. Thats wrong.
Other information
No response
Ever tried doing
SET NAMES utf8mb4;
at the beginning of your session?
Ever tried doing
SET NAMES utf8mb4;at the beginning of your session?
YES. I have tried also:
SET NAMES utf8mb4;
set character set utf8mb4;
or
SET NAMES utf8;
set character set utf8;
I must repeat, that it should work for ALL characters or NONE. Its working for SOME.
Is it, by chance, working with characters that are historically in latin-1 but not in whatever 8-bit codepage used to serve your language (looks like Slovak or Czech, so iso8859-2 or cp1252)? Or vice versa? The thing is that your code should be UTF8 across the board, and if the unicode is in your perl source, you need to use utf8; somewhere at the top too.
Everything is in UTF-8 on input or output. Everything is in UNICODE inside script.
I am using:
use utf8; # this script text is in UTF-8 (auto decode)
use feature 'unicode_strings'; # this script use unicode in regexp (auto decode)
in every file.
I am using
utf8::decode($query);
for every input.
i am using
utf8::encode($page);
just before final print.
Also everything is woking fine with DBD::MariaDB.
I am mostly using this source: https://perldoc.perl.org/perlunifaq
And thank You for Your help Yaroslav. I will be using only Maria Database in future, so i will not use MySQL, but i have reported this bug to warn and help others.
It's interesting because I haven't run into this and I'm using Unicode a lot. Granted, I'm still using DBD::mysql 4.050, so there might have been a regression or other, because I can see that barring some system setting on the way, you haven't left a stone unturned here (system locale in both client and server? The character set declared on the table columns themselves maybe?).
Sorry that I had to ask all of this stuff, but since I've had my own problems with Unicode, I just know how hard it can be to make sure all ducks are really in a row and how one stupid setting can ruin a week.
This is likely an instance of the Unicode bug present in this distribution and why you should use DBD::MariaDB, see https://github.com/perl5-dbi/DBD-mysql/issues?q=is%3Aissue+is%3Aopen+label%3Autf8 and https://blogs.perl.org/users/grinnz/2023/12/migrating-from-dbdmysql-to-dbdmariadb.html. The workaround if so would be to call utf8::upgrade on any unicode strings immediately before passing them as parameters (it operates on the string in-place, and leaves the string unchanged to Perl but different to broken interpretations like DBD::mysql).