mysql-convert-latin1-to-utf8
mysql-convert-latin1-to-utf8 copied to clipboard
utf8 string cut off
After runnig the script words like: Américo, año, agüita, and other spanish words become: Am, a, ag (the words got sliced. I tryed changing ther encoding to utf8mb4, but got the same result.
@Keno-Chile can you share a partial dump of your input data you're trying to convert?
I have the same issue. For example "DO BI¯UTERII" is changed to "DO BI"
@dszyfelb a shared partial data dump might be helpful to debug!
Thanks again for sharing your work! I've encountered the same issue.
I tested thoroughly on a staging server loaded with a copy of production data, first executing on the structure only (no errors), and then on the full database. All seemed to go well. Knowing many of the records in our 10 year old MySQL database are non-critical, I executed on the production database after confirming all current/sensitive records were intact. Here's the collation map I used:
// TODO: The collation you want to convert the overall database to
$defaultCollation = 'utf8mb4_0900_ai_ci';
// TODO Convert column collations and table defaults using this mapping
// latin1_swedish_ci is included since that's the MySQL default
$collationMap =
array(
'latin1_bin' => 'utf8_bin',
'latin1_general_ci' => 'utf8mb4_0900_ai_ci',
'latin1_swedish_ci' => 'utf8mb4_0900_ai_ci',
);
- All instances of
FedEx International Economy®becameFedEx International Economy - Most German was dropped:
SchönfliessbecomeSch. The same happened to words containing charactersüandÃ, etc. DziałdowskaremainedDziałdowska
PHP version: 7.4.3 MySQL: 8.0.22-0ubuntu0.20.04.3 (Ubuntu)
Although the lost (old) data is unimportant to our needs, I have full backups of the original data in latin1_swedish_ci and have pulled a few of the problematic records. I'll be glad to experiment further if you have any ideas.
Edit: Looking through my SQL backups, all backed up tables were dumped without explicit declaration of the collation, i.e.:
ENGINE=InnoDB DEFAULT CHARSET=latin1;
After running the script, all tables now show an explicit collation, even though this matches the MySQL 8.0 server default:
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Is it possible the script was unable to read the collation?