hashover-next icon indicating copy to clipboard operation
hashover-next copied to clipboard

Add utf8mb4 charset hint to database documentation

Open da2x opened this issue 3 years ago • 2 comments

utf8 is an alias for utf8mb3 in MySQL and MariaDB. Some emojis use 4-bytes, so recommend utf8mb4.

da2x avatar Dec 02 '21 10:12 da2x

Is there any reason not to also use utf8mb4 as the default in secrets.php? I would like to support all emoji by default, unless there's a good reason not to.

jacobwb avatar Dec 02 '21 22:12 jacobwb

SQLite, PostgreSQL, and others handles 2–4 bytes from utf8 as per the Unicode standard. MySQL wanted to save RAM back in the day and normalized on utf8 meaning 3-bytes instead; which is why you need to specify utf8mb4 to get full Unicode support. MariaDB inherited this legacy from MySQL. The other database defaults in the secrets file is for SQLite.

So … yeah. Do you want to default to MySQL-legacy-workaround or the guys who’ve followed the Unicode standard without introducing issues for their users? The ambiguity is why I put it in the documentation. It’s a common issue and you might end up with breaking multibyte emojis. But that’s kind of what you get when choosing MySQL/MariaDB.

da2x avatar Dec 02 '21 23:12 da2x