phplist3 icon indicating copy to clipboard operation
phplist3 copied to clipboard

Use utf8mb4 for the connection etc

Open bramley opened this issue 1 year ago • 8 comments

Description

This change sets the character set used for the client-server exchanges to utf8mb4 instead of the current utf8, which is an alias for utf8mb3.

Using utf8mb4 will allow a wider range of characters, supplementary characters, to be sent, specifically emoji characters. If once-off database table changes are made then subject lines can include emoji characters.

This change is backward compatible, it has no effect when supplementary characters are not used.

Related Issue

https://github.com/phpList/phplist3/issues/1000

Screenshots (if appropriate):

bramley avatar Nov 05 '23 09:11 bramley

Does this affect the version of Mysql/Mariadb that is a minimum requirement?

michield avatar Nov 05 '23 12:11 michield

It seems to have been introduced in MySQL 5.5 from 2010 https://dev.mysql.com/blog-archive/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/ and specifically in 5.5.3 from this PDF file of release notes

The utf8mb4 character set has been added. This is similar to utf8, but its encoding allows up to four bytes per character to enable support for supplementary characters

bramley avatar Nov 05 '23 15:11 bramley

It needs to change here

  • https://github.com/phpList/phplist3/blob/main/public_html/lists/admin/initialise.php#L148
  • https://github.com/phpList/phplist3/blob/main/public_html/lists/admin/mysqli.inc#L405

as well/ In fact, "initialise" should use the Sql_Create_Table from mysqli.inc but we can fix that some other time

After that, I was able to put mb4 characters in the subject and content and it worked

michield avatar Dec 29 '23 17:12 michield

additionally, we may want to add something to the upgrade script to convert all tables to utf8mb4, although we can also do it as a seperate step like we did before https://github.com/michield/phplist3/blob/master/public_html/lists/admin/converttoutf8.php

michield avatar Dec 30 '23 11:12 michield

These suggestions are going beyond the original scope of the change, which was simply to avoid undoing a manual change made to a phplist file when upgrading. The conversion of an existing database to utf8mb4 would have been made manually.

I don't know how "safe" it would be to try to automate the conversion to utf8mb4 and any reasons why that might fail. It seems safer to leave that to each installation to do so that it is entirely under their control.

Using utf8mb4 for new installations seems sensible though but might need more investigation. This mysql documentation page lists some possible consequences

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-conversion.html

bramley avatar Dec 30 '23 12:12 bramley

Ok, let's leave the scope for this small, and we can tackle conversion and other mb4 related work in a different PR.

michield avatar Feb 10 '24 09:02 michield

I have been looking at this recently because a system I support had a problem with a campaign body including a 4-byte UTF-8 character, so I applied the changes manually.

Looking at the conversion to utf8 page, and the same code in the upgrade page, then trying to use that to upgrade to utf8mb4 showed an issue with the SQL used to change each column

            Sql_Query(sprintf('alter table %s change column %s %s %s default character set utf8',

This syntax requires to repeat the full column definition. Any columns that have been defined as not null or with a default value are losing those on this upgrade. Further, the upgrade is applied on the first upgrade of a new installation, even though that is already utf8. So it looks like some rework is needed.

What is simple and probably quite safe is to upgrade only the columns that are used for a campaign where an admin is likely to want to include a 4-byte UTF-8 character. These are the subject, message and textmessage fields on the messages table, and the data field on the message_data table.

bramley avatar Feb 10 '24 10:02 bramley

I have added another commit that sets the character set for the message subject and content. These seem to be the fields to which people want to add 4-byte UTF-8 characters such as emojis.

I have tested this change by creating a new database, and upgrading an existing one. In both cases I could copy/paste 😀 into the campaign content, save the campaign, then send a test email successfully.

bramley avatar Feb 11 '24 11:02 bramley

This pull request has been mentioned on phpList Discuss. There might be relevant details there:

https://discuss.phplist.org/t/3-6-15-release-candidate-is-available-for-testing/9473/1

phpListDockerBot avatar Apr 16 '24 09:04 phpListDockerBot

This pull request has been mentioned on phpList Discuss. There might be relevant details there:

https://discuss.phplist.org/t/phplist-3-6-15-has-been-released/9495/1

phpListDockerBot avatar Apr 25 '24 15:04 phpListDockerBot