wordcamp.org icon indicating copy to clipboard operation
wordcamp.org copied to clipboard

Ignore locale when sorting for shortcode "camptix_attendees"

Open iworks opened this issue 9 months ago • 6 comments

Describe the bug

Showing attendees by camptix_attendees shortcode ignore local sorting.

To reproduce

Steps to reproduce the behavior:

  1. Go to https://krakow.wordcamp.org/2024/czym-w-ogole-jest-wordcamp/uczestnicy/
  2. Scroll down to the last atendees.
  3. See error - at the end you can se "Łukasz Jasiński" and "Łukasz Mastalski" - they should be shown after the "L" letter before "M" letter.

Expected behavior

Proper sorting by name according to the site locale.

Screenshots / Screencasts

Zrzut ekranu z 2024-05-02 12-38-27

WordCamp

If this is a problem on a specific WordCamp's site, list the site or page URL here.

iworks avatar May 02 '24 10:05 iworks

I think this is actually due to the data storage type, as we don't do any special for sorting in locales, just on post_title which in this case is the attendee name (https://github.com/WordPress/wordcamp.org/blob/f79c65ea40e3ba7ae54c07bdd932d9ff72796a15/public_html/wp-content/plugins/camptix/addons/shortcodes.php#L207-L219).

Unsure how we've done this before, but I think we'd have to change that in the database (which might require some assistance to do so) and there is a chance of data loss.

@dd32 have you known us do this before?

pkevan avatar May 07 '24 10:05 pkevan

Actually - if we sort by polish collation (utf8mb4_polish_ci) then the results look correct, but unsure if we can push this through wp_query.

pkevan avatar May 07 '24 11:05 pkevan

Ah - this isn't possible through WP Query, due to the santitization of the orderby parameter: https://github.com/WordPress/wordpress-develop/blob/6.5/src/wp-includes/class-wp-query.php#L1663-L1685

pkevan avatar May 07 '24 16:05 pkevan

I think this is actually due to the data storage type

Yup, because it's stored as utf8mb4_unicode_ci it's going to be sorted by that collation - Which apparently uses an older set of weight keys.

I think we'd have to change that in the database

My initial response is "noooo". However, it turns out that core does use the utf8mb4_unicode_520_ci collation when possible: https://github.com/WordPress/wordpress-develop/blob/473e2554db8e547a07a16f73080ca49c0c30b89f/src/wp-includes/class-wpdb.php#L894-L897

(utf8mb4_unicode_ci uses v4 utf8 weighting keys for sort, utf8mb4_unicode_520_ci uses v5.2 utf8 weighted sorts)

AFAIK we don't use that, because a combination of HyperDB + how utf8mb4 is forced on..

Upon looking into it, $wpdb->has_cap( 'utf8mb4_520' ) returns false on WordCamp. But HyperDB appears to support it But it turns out not to, because $wpdb->db_version() return 5.5.5 as https://github.com/WordPress/wordpress-develop/commit/fed98bd9ef9a232d102c41e74944d3c21cd6183e is not applied to HyperDB..

The above won't "fix" this though; it'll just let new tables be created using 520 AFAIK, we could adjust existing tables though.

isn't possible through WP Query

This would be possible via the posts_request SQL filter, for example, the following query sorts as you'd expect:

SELECT post_title FROM wc_posts WHERE post_type = 'tix_attendee' ORDER BY post_title COLLATE 'utf8mb4_unicode_520_ci' ASC

dd32 avatar May 07 '24 23:05 dd32

Thanks @dd32 - I think I prefer the non-update options via the posts_request filter.

We'd need to adjust the query here: https://github.com/WordPress/wordcamp.org/blob/f79c65ea40e3ba7ae54c07bdd932d9ff72796a15/public_html/wp-content/plugins/camptix/addons/shortcodes.php#L275-L292 to include a suppress_filters to false since the filter wouldn't run, but otherwise this could work.

pkevan avatar May 08 '24 11:05 pkevan

to include a suppress_filters to false

I forgot get_posts sets that by default.. As cray-cray as it sounds, if that introduces problems, I would then say use the query filter..

dd32 avatar May 09 '24 00:05 dd32