cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Use locale.nl_langinfo in `_strptime.py`

Open brettcannon opened this issue 15 years ago • 9 comments

BPO 8915
Nosy @abalkin

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/abalkin'
closed_at = None
created_at = <Date 2010-06-06.03:35:08.311>
labels = ['3.7', 'type-feature', 'library']
title = 'Use locale.nl_langinfo in _strptime'
updated_at = <Date 2016-09-10.18:34:48.582>
user = 'https://github.com/brettcannon'

bugs.python.org fields:

activity = <Date 2016-09-10.18:34:48.582>
actor = 'belopolsky'
assignee = 'belopolsky'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2010-06-06.03:35:08.311>
creator = 'brett.cannon'
dependencies = []
files = []
hgrepos = []
issue_num = 8915
keywords = []
message_count = 3.0
messages = ['107181', '107445', '125958']
nosy_count = 1.0
nosy_names = ['belopolsky']
pr_nums = []
priority = 'low'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue8915'
versions = ['Python 3.7']

brettcannon avatar Jun 06 '10 03:06 brettcannon

It might perform better to use locale.nl_langinfo to get the current locale's datetime information instead of reverse-engineering from strftime (need to benchmark to see if this is true). This would need to be conditional as the datetime info might not be exposed through nl_langinfo.

brettcannon avatar Jun 06 '10 03:06 brettcannon

See also bpo-8957. If this happens, I would like to add a pure python implementation strftime. See also bpo-7989.

abalkin avatar Jun 10 '10 04:06 abalkin

I would also like to consider using OS strptime on platforms with a decent implementation.

abalkin avatar Jan 10 '11 23:01 abalkin

Not fully possible, nl_langinfo doesn't support LC_TIME.

We can partially use it, ~33% faster.

def _getlang():
    lang = locale.setlocale(locale.LC_TIME, None)
    encoding = locale.nl_langinfo(locale.CODESET)

    return lang, encoding
$ python3.14 -m timeit -s "from _strptime import _getlang" "_getlang()"
1000000 loops, best of 5: 298 nsec per loop
$ ./python -m timeit -s "from _strptime import _getlang" "_getlang()"
1000000 loops, best of 5: 203 nsec per loop                    # ~33% faster

Caching the information will result in a decent performance gain.

StanFromIreland avatar Mar 16 '25 10:03 StanFromIreland

I would also like to consider using OS strptime on platforms with a decent implementation.

This should be a separate issue.

StanFromIreland avatar Mar 16 '25 10:03 StanFromIreland

@picnixz _strptime is currently pure python. This does not need the extension-modules label. It does however need the performance label :-)

StanFromIreland avatar Mar 16 '25 16:03 StanFromIreland

I never know which part of the date/time API is duplicated in C and which one is not so thanks.

picnixz avatar Mar 16 '25 16:03 picnixz

nl_langinfo() supports LC_TIME.

But there is other issue. Month and weekday names returned by nl_langinfo() and strftime() can be different.

  • In br_FR locale: "'" vs "ʼ" (U+02BC).
  • In ast_ES, ca_AD, ca_ES, ca_FR, ca_IT, ca_ES, oc_FR and wa_BE locales: "'" vs "" (U+2019).
  • In yi_US locale: "" (U+FB2E) vs "אַ" (U+05D0 U+05B7) and many others.

I suppose this is because strftime() is implemented using wcsftime() if it is available, but nl_langinfo() needs decoding from the current locale encoding. 8-bit encodings (ISO8859-1 for br_FR and ca_FR, CP1255 for yi_US, etc) can replace some Unicode characters with other similarly looking characters.

This issue exists also in the current code: strptime() is not always able to parse string formatted in C or other language or in Python on other platform, even if they support the same locale. strptime() should be more lenient and accept different forms of apostrophes and different form of normalization. This is a different issue, but we cannot just use nl_langinfo() without breaking existing tests until it is fixed.

On glibc platforms we can also use private API to get Unicode result of nl_langinfo(), without using intermediate locale encoding. This may be faster and allows to avoid temporary switching the current locale. This will help also for Python implementation of strftime(). This is also a different issue, but it will help to use nl_langinfo().

serhiy-storchaka avatar Jun 28 '25 06:06 serhiy-storchaka

There is other issue. On some locales there are different names for months and weekdays, it is not a matter of some normalization:

  • be_BY.utf8: "чэрвеня" vs "červienia".
  • tt_RU.utf8: "июнь" vs "yün".
  • nan_TW@latin: "6goe̍h" vs "六月".
  • sr_RS@latin: "jun" vs "јун".
  • ug_CN@latin: "Seper" vs "ئىيۇن".
  • uz_UZ@cyrillic: "Июн" vs "Iyun".
  • sd_IN@devanagari: "जूनि" vs "جون".

It seems that the modifier (@latin, @cyrillic, @devanagari) is just ignored in some cases (this is the locale module bug). But I do not know why there is a difference for be_BY.utf8 and tt_RU.utf8.

serhiy-storchaka avatar Jun 28 '25 09:06 serhiy-storchaka