db icon indicating copy to clipboard operation
db copied to clipboard

Use utf8mb4 by default in mysql, not utf8

Open ghost opened this issue 5 years ago • 3 comments

@teo1978 commented on Jul 29, 2018, 3:42 PM UTC:

MySQL's "utf8" charset, which shouldn't even be called that since it is not actual utf8, was a disgrace that happened years ago because someone in the MySQL team was stupid. The real utf8-conforming character set in MySQL is utf8mb4. In 2018, we should be using that and encouraging to use that as a default for MySQL. Any hypothetical compatibility issue with old MySQL versions, I don't think is any more stringent, and worth caring about, than current existing incompatibilities with obsolete php versions, for example. And the disastrous effects of using MySQL's "utf8" charset are much worse.

  1. I installed the Advanced Application Project template, I created a database for it with utf8mb4 utf8mb4_unicode_ci as the default collation. Then I configured main_local.php with
            'dsn' => 'mysql:host=localhost;dbname=iframe2',
            //...
            'charset' => 'utf8mb4',

I followed the rest of instructions (including running the migration) and have the application up and running, and I notice that the user table that has been created has utf8_unicode_ci (i.e. "utf8" charset) as its default and for all its fields.

I'm not sure whether this is a specific choice for the User model table based on the belief that allowing 4-byte characters such as smilies in a username wouldn't be a good idea (which could be questionable but arguable) or if this comes from a general "we use utf8 everywhere by default" policy. If it's the latter, then I'm seriously worried. I wouldn't like to see any more tables created by the framework with utf8 specifically forced as the charset despite the database default being utf8mb4. I can understand forcing a specific charset rather than using the database default for some things, but then that specific charset should be utf8mb4 (which again is true utf-8) for anything that doesn't have a specific reason to be something else.

  1. In the Getting Started guides both for the basic and advanced apps I see you suggest utf8 as the charset in the configuration for a MySQL database. Again, it should be utf8mb4

  2. the actual config file provided with the basic app also has utf8.

Hopefully it's just a matter of the default configuration files provided with the template application projects and the examples in the docs (and apparently something else in the Advanced app, because utf8 was assigned to the user model table despite the database default and the config), and not much more than that.

This issue was moved by samdark from yiisoft/yii2#16576.

ghost avatar Oct 09 '18 17:10 ghost

@samdark commented on Jul 29, 2018, 6:54 PM UTC:

Related to #16489, #5119 (comment), #16545,

ghost avatar Oct 09 '18 17:10 ghost

@cebe commented on Jul 29, 2018, 7:59 PM UTC:

Note that switching from a 3 byte utf8 to 4 byte utf8 causes some problems with maximum index sizes in Mysql, e.g. luyadev/luya-module-admin#109

I agree that the default should be utf8mb4 though.

ghost avatar Oct 09 '18 17:10 ghost

DB and AR

terabytesoftw avatar Mar 31 '20 13:03 terabytesoftw