webtrees icon indicating copy to clipboard operation
webtrees copied to clipboard

Sorting of names with special characters is not correct

Open ro-la opened this issue 3 years ago • 7 comments

Surnames sorting in individual lists is not correct for surnames which contain nonenglish characters. For exampel surname KÖNIG is sorted at the end of the list and not as KONIG.

On my instalation I solved it with changing this line https://github.com/fisharebest/webtrees/blob/df016312e101eddafe2bcaf502b9644930c667f2/app/Module/IndividualListModule.php#L711 to new Expression('n_surname /*! COLLATE ' . $collation . ' */ AS n_surname'),

This bugreport is based on this discussion: https://www.webtrees.net/index.php/en/forum/help-for-release-2-1-x/37047-sorting-individual-list-is-different-in-table-list-list-style

ro-la avatar Jun 13 '22 12:06 ro-la

The existing line (utf8_bin) is there so that we can identify/count surnames that differ only by accents.

For example, our query might return

KONIG - 70 individuals KÖNIG - 25 indivdiuals

If we changed the collation in the SQL statement, we'd get 95 individuals with whichever variation was found first.

Using collations only works on MySQL. Now that we also support Postgres and SQLite, we must do all these calculations in PHP. See #3459

fisharebest avatar Jun 13 '22 14:06 fisharebest

@ro-la

In this specific example the logic is correct as KONIG does not equals KÖNIG. This are two different names. In German KÖNIG equals KOENIG.

Sorting of names in German is described by DIN 5007 for namelists (e.g. phone book) in chapter 6.1.1.4.2

ghost avatar Jun 13 '22 17:06 ghost

@Lars1963 Now is König sorted at the end of the list together with other names which have an accented letter after "K". I want, that this name is sorted "somewhere" near other surnames which start with "Ko".

ro-la avatar Jun 13 '22 18:06 ro-la

Okay, did missunderstood you. Correct alphabetical sorting should be:

Göbel
Goethe
Göthe
Götz
Goldmann

ghost avatar Jun 13 '22 18:06 ghost

For example, our query might return

This, of course, depends on the current language.

In English Ö will sort the same as O. In German, Ö will sort the same as OE. In Swedish, Ö will sort as a separate letter, after Z.

I was thinking in English - sorry for the confusion.

fisharebest avatar Jun 13 '22 19:06 fisharebest

This, of course, depends on the current language.

And that makes senses.

@ro-la What language do you use? Which official sortingrules are correct for your language?

ghost avatar Jun 13 '22 20:06 ghost

What language do you use? Which official sortingrules are correct for your language?

My primary language is Slovak. My users use Czech, German (Austria), Hungarian. A typical family from former Empire Austria-Hungary.

In Slovak we do not have such strict norms for sorting like in German - OK I think we have, but we do not follow them strictly. According to our norm the accented letters should be sorted after the non accented. Mostly is accepted when accented letters have the same value as non accentend. Double letter "Ch" is always after "H". But we know, that very offten we have to look in the letter "C" :-) The double letter "Dz" and "Dž" are 1. not so common and 2. they sort automaticaly in the correct position.

@fisharebest As far as I understand the webtrees-code it is also a problem of the datatables component we use. I found these two articles: https://datatables.net/blog/2017-02-28 https://datatables.net/plug-ins/sorting/intl

The first article is a little bit outdated, but perhaps the second plugin could be a help.

On the other hand it is not such a big issue, and it seems to be quite a complex issue.

ro-la avatar Jun 20 '22 10:06 ro-la