pyuca icon indicating copy to clipboard operation
pyuca copied to clipboard

How do I use different locale

Open Hultner opened this issue 7 years ago • 7 comments

In the UCA standard's introduction they present an example as the following

Language Swedish: German:
-- z < ö ö < z

@jtauber How would I achieve this with PyUCA?

If I use the foreign C++ bindings with PyICU I can produce the above example like this

>>> def collate_compare(collator, a, b): return collator.getSortKey(a) > collator.getSortKey(b)
...
>>> col_de = Collator.createInstance(Locale("de_DE"))
>>> col_de.setStrength(Collator.PRIMARY)
>>> col_se = Collator.createInstance(Locale("sv_SE"))
>>> col_se.setStrength(Collator.PRIMARY)
>>> collate_compare(col_de, "ö", "z") # German: ö is more than z => False
False
>>> collate_compare(col_se, "ö", "z") # Swedish: ö is more than z => True
True

However, I would much prefer a pure python solution without mixing in external C++ dependencies as these make installation and usage cumbersome. I could not find any guidance in the documentation on how to achieve this goal with PyUCA so would love any help or pointers.

Hultner avatar Sep 13 '18 12:09 Hultner

The way to achieve different sort orders is to use a different collation element table. PyICU must ship with different CETs for different locales.

jtauber avatar Oct 20 '18 01:10 jtauber

Hm, so pyuca is only configured to uca-en and any other locale would need to download uca-xy raw files? That's not convenient (e.g. for installing pyuca from PyPI)

Perhaps pyuca could have some sort of a switch to use different localizations and get all the files instead of the English one only?

dvorapa avatar May 19 '19 16:05 dvorapa

It is not just configured for English-only. It by default uses the DUCET (which is suitable for many things beyond English). You can supply an alternative CET in the constructor.

If there are alternative CETs we could ship with, I'm happy to do that.

jtauber avatar May 20 '19 15:05 jtauber

Well, I need specifically uca-cs (based on CLDR: https://github.com/unicode-org/cldr/tree/master/common/collation, could be tested here: http://demo.icu-project.org/icu-bin/collation.html)

I thought pyuca could help me with that.

dvorapa avatar May 21 '19 11:05 dvorapa

pyuca is just an implementation of the UCA. CLDR is an extension of UCA which I haven't (yet) implemented. CLDR support could be added to pyuca or a new library could be created.

jtauber avatar May 25 '19 23:05 jtauber

However, pyuca can still be used for locale-specific collation, you just need to manually create the appropriate CET for your locale.

jtauber avatar May 25 '19 23:05 jtauber

I see, well, this way is not convenient for automatic testing.

dvorapa avatar May 26 '19 12:05 dvorapa