SMF icon indicating copy to clipboard operation
SMF copied to clipboard

Use UTF8MB4 everywhere

Open Sesquipedalian opened this issue 1 year ago • 7 comments

  • [ ] Upgrade all MySQL tables to use utf8mb4.
  • [ ] Get rid of Utils::$context['utf8']
  • [ ] Change Utils::fixUtf8mb4() to encode or decode as appropriate based on database. (This will allow us to update old data on the fly.)

Sesquipedalian avatar Dec 04 '23 20:12 Sesquipedalian

ref https://github.com/SimpleMachines/SMF/pull/6409

live627 avatar Dec 04 '23 20:12 live627

The work on this will depend on the upgrader/installer logic. We have plans for that overhaul, so we need to get that in place before we can even get a PR for the upgrade logic.

jdarwood007 avatar Dec 04 '23 22:12 jdarwood007

That's a good point. I'll adjust the milestones in our internal roadmap. There isn't a specific issue for the upgrader and installer improvements here on GitHub yet, but basically I'm going to move the "Installer and upgrader improvements" item from Alpha 3 to Alpha 2.

Sesquipedalian avatar Dec 04 '23 22:12 Sesquipedalian

Note that some of the DB changes in #6409 aren't required if the DB meets some minimum requirements, e.g., tables must be InnoDB & must have a row_format that is not COMPACT or REDUNDANT.

These DB constraints are a problem for DBs created prior to mysql 5.7 and just migrated forward. Note the 2.1 upgrader did not change or address these.

So... It is quite likely we have a lot of MyISAM 2.1 DBs out there. Or, even if InnoDB, COMPACT rows will create a problem. This is explained in the writeup for #6409.

Note also that if the innodb_default_row_format in a table is COMPACT or REDUNDANT, the table would need to be rebuilt before converting to MB4. (EDIT: An ALTER TABLE to change the row format should be sufficient...)

The approach in #6409 was to modify some of the indexes, to sidestep these constraints, so the conversion would be successful no matter what the engine & row format.

sbulen avatar Dec 06 '23 04:12 sbulen

Bottom line is that either db changes are needed (191s on the indexes), or, engine (innodb) & row format (DYNAMIC) upgrader steps.

At the very least, a check/error to get folks to do that first manually.

sbulen avatar Dec 07 '23 00:12 sbulen

Thanks, @sbulen. That's helpful.

Sesquipedalian avatar Dec 07 '23 05:12 Sesquipedalian

FWIW, I'd suggest going all InnoDB & DYNANIC. Greater consistency & more modern across the board.

One other thing to think about... Entity conversion. Today utf8 mb4 chars are entity encoded.

The whole point here is to not have to do that...

At upgrade time??? Under database maintenance as well???

Search impacts here, too, as words may be based on entity-encoded words. #6405

sbulen avatar Dec 08 '23 02:12 sbulen