anchor-cms icon indicating copy to clipboard operation
anchor-cms copied to clipboard

Support full Unicode in database.

Open Zegnat opened this issue 10 years ago • 7 comments

See How to support full Unicode in MySQL databases by Mathias Bynens:

Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

Currently Anchor only lets me chose a utf8 based collation, but it would be better to offer utf8mb4 based collations.

Zegnat avatar Jan 31 '15 20:01 Zegnat

Thanks for your input, we haven't encountered this as a problem as of yet. (AS far as I'm aware)

Because this would require changing the configuration of the database we'll have to look into a way to migrate utf8 collated databases to utf8mb4 or what ever collation we end up choosing in the future. I'm pretty sure that it would cause problems if we weren't to do so.

CraigChilds94 avatar Mar 30 '15 15:03 CraigChilds94

True, I didn’t think about migrating existing installations. I simply edited $vars['collations'] (s/utf8_/utf8mb4_/g) and DB::factory’s charset setting before installing Anchor and it seemed to have no problems setting up the clean database.

Zegnat avatar Apr 03 '15 09:04 Zegnat

This is pretty important. Have you heard about this? 😔😚😅😊😆😐😅😈😐😓😠😉😈😋😔😠 Unicode emoji is removed from post.

profi248 avatar Aug 06 '15 21:08 profi248

Try making a span element with class emoji, and set the content equal to an emoji code: <span class="theme-emoji" content="\1F60E"></span>

Why the class? So you can fix spacing issues in CSS. I'm only guessing that this may work, I saw a theme implement this kind of idea with the right arrows. Instead of the usual &rarr; the theme used the Unicode hex code.

TheBrenny avatar Aug 07 '15 15:08 TheBrenny

Thanks for response, but nothing shows up 😠 I think that you sholuld really use proper Unicode

ALTER TABLE anchor_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci

Running this for every table will convert it transparently on upgrade

profi248 avatar Aug 07 '15 16:08 profi248

Do you have a working test for this?

TheBrenny avatar Aug 07 '15 23:08 TheBrenny

Running this for every table will convert it transparently on upgrade

According to Mathias’ article you need to run a little more than that. But yes, that is the main gist of it.

Do you have a working test for this?

I can only say that I haven’t seen any weird things happening with a database that was set to utf8mb4 from day one using my slightly modified Anchor installer. Probably also because utf8mb4 is backwards-compatible with utf8.

Big parts of Anchor are already UTF8 aware, e.g. the slug-function uses both mb_strtolower and htmlentities with their optional encoding parameter set to UTF-8. This issue is all about getting the storage database inline with that.

Zegnat avatar Aug 08 '15 09:08 Zegnat