mycli icon indicating copy to clipboard operation
mycli copied to clipboard

WIP Default to standards-compliant utf8mb4 character set

Open rolandwalker opened this issue 4 years ago • 7 comments

Description

xref #915

The utf8 character set in current MySQL versions is not actually standards-compliant. The standards-compliant UTF-8 character set is spelled utf8mb4, and that should be mycli's default.

~WIP because this should be researched/tested for MariaDB and Percona.~ Edit: tested on MariaDB. Researched for Percona: pages such as https://www.percona.com/blog/2018/04/10/migrating-database-charsets-to-utf8mb4/ have no suggestion of incompatibility.

Checklist

  • [x] I've added this contribution to the changelog.md.
  • [x] I've added my name to the AUTHORS file (or it's already there).

rolandwalker avatar Jan 07 '21 13:01 rolandwalker

I think this default should depend on the server version: mysql < 5.5 cannot handle utf8mb4.

gfrlv avatar Jan 10 '21 14:01 gfrlv

@pasenor excellent point.

rolandwalker avatar Jan 11 '21 12:01 rolandwalker

Wait, how would that work? We set the charset before we make the connection.

rolandwalker avatar Jan 11 '21 12:01 rolandwalker

Oops, indeed, that's messy. But we should be able call set_charset() on the PyMySQL connection object inside our SQLExecute once know the version

gfrlv avatar Jan 11 '21 20:01 gfrlv

Note that utf8 is considered an alias for utf8mb3 and both MySQL and MariaDB are actively doing work to eventually change the alias to map to utf8mb4.

  • https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html
  • https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-30.html

Important Change: A previous change renamed character sets having deprecated names prefixed with utf8_ to use utf8mb3_ instead. In this release, we rename the utf8_ collations as well, using the utf8mb3_ prefix; this is to make the collation names consistent with those of the character sets, not to rely any longer on the deprecated collation names, and to clarify the distinction between utf8mb3 and utf8mb4. The names using the utf8mb3_ prefix are now used exclusively for these collations in the output of SHOW statements such as SHOW CREATE TABLE, as well as in the values displayed in the columns of Information Schema tables including the COLLATIONS and COLUMNS tables. (Bug #33787300)

  • https://jira.mariadb.org/browse/MDEV-8334
  • https://jira.mariadb.org/browse/MDEV-22217

The utf8mb4 character set was introduced in MySQL 5.5.4, this was not a G.A. release (5.5.8 was the first G.A. release). So utf8mb4 should be used for MySQL 5.5 and newer.

  • https://downloads.mysql.com/docs/mysql-5.5-relnotes-en.pdf

dveeden avatar Oct 27 '22 08:10 dveeden

@rolandwalker Is this PR still valid? Should this be merged?

amjith avatar Apr 20 '23 20:04 amjith

@amjith yes we should do something about it. Will review.

rolandwalker avatar Apr 20 '23 20:04 rolandwalker