dbal Fix: #6146 - utf8/utf8mb3 alias causes redundant ALTER TABLE changes

Q A

Type bug

Fixed issues #6146

Q	A
Type	bug
Fixed issues	#6146

Summary

In MySQL (and MariaDB), utf8 and utf8mb3 are aliases. Depending on the database server version, the information schema may contain one or the other. Similarly, column definitions may contain one or the other.

When performing schema-comparisons, a mismatch between utf8 and utf8mb3 causes an ALTER TABLE statement to be generated - even when columns are identical.

The code already contains a function to normalize the charset/collation options: https://github.com/doctrine/dbal/blob/61d79c6e379a39dc1fea6b4e50a23dfc3cd2076a/src/Platforms/MySQL/Comparator.php#L83

My solution is to extend this function so that it also replaces utf8 with utf8mb3 (or vice-versa), to match the current database connection.

To detect the prefered character set for the connection, I've added: AbstractMySQLPlatform::informationSchemaUsesUtf8mb3() MariaDBPlatform::informationSchemaUsesUtf8mb3()

The obvious place for the new normalization logic is the CharsetMetadataProvider and CollationMetadataProvider classes. These are interfaces, so this is technically a BC break, however, they are marked @internal.

I've added tests which should cover all the new/modified code. I've run it against MySQL. I don't have MariaDB to hand.

Apr 29 '24 22:04 fisharebest

Just pushed a change to fix the psalm issues

Apr 30 '24 08:04 fisharebest

It seems there are CI jobs failing. Please take a look at this guide for more on how to handle those.

Apr 30 '24 08:04 greg0ire

ಠ_ಠ

Apr 30 '24 08:04 greg0ire

I've added tests which should cover all the new/modified code. I've run it against MySQL. I don't have MariaDB to hand.

Codecov says that some of the new code is untested, most notably the parts using version_compare. Please add functional tests for this change. As you can see, our test suite features, among other platforms MariaDB 10.5 and MariaDB 10.6, so you should be able to test that schema comparisons on either version of MariaDB indeed doesn't cause ALTER TABLE statements to be generated. Same goes for MySQL.

You can learn more about such tests at https://www.doctrine-project.org/projects/doctrine-dbal/en/stable/reference/testing.html

May 01 '24 08:05 greg0ire

Codecov says that some of the new code is untested, most notably the parts using version_compare. Please add functional tests for this change.

Running the tests locally says the code is covered. (Ad hoc testing confirms this).

XDEBUG_MODE=coverage vendor/bin/phpunit -c ci/github/phpunit/pdo_mysql.xml --coverage-html=coverage

However, Codecov says it isn't.

May 01 '24 13:05 fisharebest

I think I found the reason: [2024-05-01T09:48:28.648Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 429 - {'detail': ErrorDetail(string='Rate limit reached. Please upload with the Codecov repository upload token to resolve issue. Expected available in 1935 seconds.', code='throttled')}

Let me retry this.

May 01 '24 14:05 greg0ire

From the Codecov UI:

Coverage data is based on head 9c87c57(2 uploads) compared to base 7fb00fd(5 uploads)

… we currently send 69 uploads, so there is no way we are going to be below the upload limit. I will have to look into this separately.

May 01 '24 14:05 greg0ire

Ah there is one issue though: since this is a fix, you should target 3.8.x

May 01 '24 14:05 greg0ire

Ah there is one issue though: since this is a fix, you should target 3.8.x

I can create another PR for this branch.

May 01 '24 15:05 fisharebest

Please don't, we usually do merges up. Instead, let me retarget your PR, and please rebase your commits on 3.8.x and force push.

May 01 '24 16:05 greg0ire

Please don't, we usually do merges up. Instead, let me retarget your PR, and please rebase your commits on 3.8.x and force push.

The PR uses Connection::getServerVersion() to select the prefered alias for utf8/utf8mb3. In 3.8.x, this method isn't available, so I'll need to find another approach.

I'm going to be offline for the next week or so, and will look at this on my return.

May 02 '24 15:05 fisharebest

Enjoy your time off 👍

May 02 '24 17:05 greg0ire

@fisharebest do you need help with the rebase?

Jun 04 '24 11:06 greg0ire

I have rewritten my patch to use the information_schema instead of server version numbers. This seems more robust, and it should also continue to work if MySQL remove utf8mb3 in version 9.

However, I have been unable to back-port this PR to the 3.8.x branch. (I can, but it doesn't fix the bug...).

The handling of charset/collation is different. I'd probably need to copy lots of other changes from the 4.0.x branch to make it work.

Options are:

apply this patch to 4.0.x only
attempt to backport the changes in charset/collation handling from 4.0.x to 3.8.x and then try again
????

Jun 04 '24 20:06 fisharebest

I'd be surprised if there were lots of changes you could backport that are not breaking changes, so I'd say we should consider applying this to 4.0.x although given the usage statistics of 4.0.x, it won't have much effect at first.

Jun 04 '24 21:06 greg0ire

I have rewritten my patch to use the information_schema instead of server version numbers. This seems more robust, and it should also continue to work if MySQL remove utf8mb3 in version 9.

As far as I understand this, utf8 would stay utf8 once utf8mb3 isn't supported anymore. I didn't find information about when this alias handling was introduced on a short search though.

Jul 27 '24 20:07 SenseException

it should also continue to work if MySQL remove utf8mb3 in version 9.

Now that MySQL 9.0 is released, we can see that the logic still works, and the tests pass.

Aug 14 '24 13:08 fisharebest

Can you please rebase onto 4.1? That should trigger a CI pipeline that includes MySQL 9.

Aug 14 '24 13:08 derrabus

All the tests are passing. Looks like a missing token in the CI tools.

Sep 04 '24 11:09 fisharebest

You can ignore the Codecov failure, that issue is tracked here: https://github.com/codecov/codecov-action/issues/1548

Sep 04 '24 12:09 greg0ire

Yes sorry, we lost track of this PR, apparently. I'll do a review of the PR.

Sep 04 '24 12:09 derrabus

@fisharebest Will you have time to pick up the work on this PR? From what I gathered, we don't need the detection logic that you've built at all.

The charsets utf8 and utf8mb3 are the same charsets in MySQL and most likely will be for eternity. MySQL considers one of the identifiers as canonical and the other one as alias. No matter which of the both identifiers we use when creating a table, MySQL will always report the canonical name.

In MySQL 5.7 and below, utf8 is the canonical name and utf8mb3 the alias.
In MySQL 8.0 and above, utf8mb3 is the canonical name and utf8 the alias.

This means, that if we normalize both schemas to let's say the long version utf8mb3, the comparison should always succeed.

Note that MySQL 5.7 support is deprecated by now, so I would really like to keep logic that works around behavioral changes of MySQL 8.0 as simple as possible.

Nov 24 '24 16:11 derrabus

In MySQL 5.7 and below, utf8 is the canonical name and utf8mb3 the alias.

In MySQL 8.0 and above, utf8mb3 is the canonical name and utf8 the alias.

Not quite. The change in canonical/alias happened between 8.0.29 and 8.0.30. See https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html

For MariaDB it is a little more complicated. The same change was made in 10.6.1, but a run-time variable (old_mode) allows the previous canonical/alias to be used. This setting still exists on the latest version of MariaDB. See https://mariadb.com/kb/en/old-mode/#utf8_is_utf8mb3

This means, that if we normalize both schemas to let's say the long version utf8mb3, the comparison should always succeed.

Yes, but there is still the small issue of DBAL looking up default collations for charsets. See the second part of my comment at https://github.com/doctrine/dbal/issues/6146#issuecomment-2067784678

We'd need to normalise the result of these queries too.

Nov 25 '24 10:11 fisharebest

Not quite. The change in canonical/alias happened between 8.0.29 and 8.0.30. See https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html

Oh right. MySQL's versioning policy during 8.0.x times was wild.

For MariaDB it is a little more complicated. The same change was made in 10.6.1, but a run-time variable (old_mode) allows the previous canonical/alias to be used. This setting still exists on the latest version of MariaDB. See https://mariadb.com/kb/en/old-mode/#utf8_is_utf8mb3

WTF. Thank you very much for your thorough research. So, we could set that variable before we run our queries and reset it afterwards, right?

This means, that if we normalize both schemas to let's say the long version utf8mb3, the comparison should always succeed.

Yes, but there is still the small issue of DBAL looking up default collations for charsets. See the second part of my comment at #6146 (comment)

We'd need to normalise the result of these queries too.

Fine by me. But again, if we do all this normalization, we don't need run any detection logic on the server because we enable DBAL to work with MySQL/MariaDB responsing with old and new character set names (and corresponsing collation names), correct?

Nov 27 '24 08:11 derrabus

There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. If you want to continue working on it, please leave a comment.

Feb 26 '25 03:02 github-actions[bot]

This pull request was closed due to inactivity.

Mar 05 '25 03:03 github-actions[bot]

dbal dbal copied to clipboard

Fix: #6146 - utf8/utf8mb3 alias causes redundant ALTER TABLE changes

Summary

dbal
dbal copied to clipboard