Index page don't apply collation for latin diacritic
Original report by fabrice salvaire (Bitbucket: fabricesalvaire, GitHub: fabricesalvaire).
Page starting by È are placed at the end of the index instead of E.
Original comment by RogerHaase (Bitbucket: RogerHaase, GitHub: RogerHaase).
Maybe some useful ideas here:
Paul Boddie [email protected] To [email protected] Today at 3:48 AM On Monday 4. September 2017 11.27.42 Lars Kruse wrote:
Am Mon, 04 Sep 2017 09:22:01 +0200 schrieb Volker Wysk <post@volker- wysk.de>:
I mean, could one locale be fitting for a different one too, as far as sorting is concerned?
As far as I understand locales: no. (if someone knows better: please correct me)
I'm not sure whether I really understand locales better, but here are a few things that might help. Firstly, you can get the default locale as follows:
import locale locale.setlocale(locale.LC_ALL, "") # returns the locale string
This has to be done to make the process's locale information available. It is possible that something does this already in Moin, but as mentioned before, it is questionable that the process's locale is relevant for a user of a Web application. Now you can get the locale details more conveniently.
For example, to ask for the collation:
language, charset = locale.getlocale(locale.LC_COLLATE)
I would think that the collation is the most pertinent locale setting when it comes to sorting things. So, it might be more interesting to set this based on any details about the user provided by Moin. The MoinMoin.user.User object has a language attribute that could work in principle, but I'm not convinced that this is enough by itself. More on that in a moment.
Anyway, you can set the collation as follows:
locale.setlocale(locale.LC_COLLATE, "no_NO") # something I just tested
And you can apply the locale sorting as follows:
names.sort(cmp=locale.strcoll)
This will correctly sort a sequence of names where Norwegian letters are used. It seems that Unicode will work, too.
Why I don't think the Moin language code is enough is that the locale system is rather particular about what you ask it for. However, it seems that you can get a proper locale from the Moin language as follows:
language = request.user.language # will probably work given a request localename = locale.normalize(language)
For me, this yielded "no_NO.ISO8859-1" from "no".
A few problems emerge when using locale support for sorting. Firstly, you need to have the necessary locales installed for the functions to work. Secondly, switching locales affects the entire program, so you have to be careful not to cause side-effects, although this is less of a problem in a plain CGI environment.
Another thing noted earlier is that locales are language specific, so if your list of page names contains both German names and names using non-German characters, the sorting of those other characters may not be as desired. Libraries like ICU might try and reconcile different collations, but it is probably an open-ended problem. Bindings for Python are available here:
https://pypi.python.org/pypi/PyICU/
The documentation for the locale functionality is found here:
https://docs.python.org/2.7/library/locale.html
Paul
Setting the locale per request is likely not advisable as the stuff in locale module is not thread-safe.
But maybe we don't need, see this:
>>> locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
'en_US.UTF-8'
>>> sorted(l, cmp=locale.strcoll)
[u'a', u'\xe0', u'b', u'B']
>>> sorted(l)
[u'B', u'a', u'b', u'\xe0']
As you see, even with a en_US locale, the sorted result (based on locale.strcoll) is way more acceptable than the the simple sorted result. The hex char was an accented lowercase a.
@fabricesalvaire what do you think, would that be good enough?
Hmm, I tried setting LC_ALL and LANG to en_US.UTF-8, then started moin (with the builtin server).
I tried a modified PagenamesList macro, using sort(cmp=locale.strcoll), but it did not change the sort order in the expected way.
We use flask-babel, maybe there is some interference from that, but I didn't find anything about sorting in babel docs.
https://stackoverflow.com/questions/11121636/sorting-list-of-string-with-specific-locale-in-python