mysql-convert-latin1-to-utf8 utf8 string cut off

After runnig the script words like: Américo, año, agüita, and other spanish words become: Am, a, ag (the words got sliced. I tryed changing ther encoding to utf8mb4, but got the same result.

Aug 07 '18 16:08 Keno-Chile

@Keno-Chile can you share a partial dump of your input data you're trying to convert?

Sep 29 '18 23:09 nicjansma

I have the same issue. For example "DO BI¯UTERII" is changed to "DO BI"

Oct 19 '19 20:10 dszyfelb

@dszyfelb a shared partial data dump might be helpful to debug!

Dec 07 '19 14:12 nicjansma

Thanks again for sharing your work! I've encountered the same issue.

I tested thoroughly on a staging server loaded with a copy of production data, first executing on the structure only (no errors), and then on the full database. All seemed to go well. Knowing many of the records in our 10 year old MySQL database are non-critical, I executed on the production database after confirming all current/sensitive records were intact. Here's the collation map I used:

// TODO: The collation you want to convert the overall database to
$defaultCollation = 'utf8mb4_0900_ai_ci';

// TODO Convert column collations and table defaults using this mapping
// latin1_swedish_ci is included since that's the MySQL default
$collationMap =
 array(
  'latin1_bin'        => 'utf8_bin',
  'latin1_general_ci' => 'utf8mb4_0900_ai_ci',
  'latin1_swedish_ci' => 'utf8mb4_0900_ai_ci',
 );

All instances of FedEx International EconomyÂ® became FedEx International Economy
Most German was dropped: SchÃ¶nfliess become Sch. The same happened to words containing characters Ã¼ and Ã, etc.
Działdowska remained Działdowska

PHP version: 7.4.3 MySQL: 8.0.22-0ubuntu0.20.04.3 (Ubuntu)

Although the lost (old) data is unimportant to our needs, I have full backups of the original data in latin1_swedish_ci and have pulled a few of the problematic records. I'll be glad to experiment further if you have any ideas.

Edit: Looking through my SQL backups, all backed up tables were dumped without explicit declaration of the collation, i.e.: ENGINE=InnoDB DEFAULT CHARSET=latin1;

After running the script, all tables now show an explicit collation, even though this matches the MySQL 8.0 server default: ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Is it possible the script was unable to read the collation?

Jan 27 '21 03:01 jseaber

mysql-convert-latin1-to-utf8 mysql-convert-latin1-to-utf8 copied to clipboard

utf8 string cut off

mysql-convert-latin1-to-utf8
mysql-convert-latin1-to-utf8 copied to clipboard