python-holidays icon indicating copy to clipboard operation
python-holidays copied to clipboard

Inconsistent localization in country_holidays due to LANG dependency

Open pmarkoo opened this issue 1 year ago • 6 comments

Bug Report

Expected Behavior

When using the country_holidays function without specifying the language parameter (i.e., setting it to None or omitting it), the holiday names should consistently be returned in the country's original language as per the documentation.

For example, executing the following code:

import holidays
de_holidays = holidays.country_holidays("DE")
print(de_holidays.get("2024-12-25"))

Should consistently output: Erster Weihnachtstag

Actual Behavior

The country_holidays function exhibits inconsistent behavior based on the environment's LANG environment variable when the language parameter is not set:

Local Environment:

LANGUAGE: None
LC_ALL: None
LC_MESSAGES: None
LANG: None

Output: Erster Weihnachtstag

Remote Server Environment:

LANGUAGE: None
LC_ALL: None
LC_MESSAGES: None
LANG: C.UTF-8

Output: Christmas Day

Steps to Reproduce the Problem

Easy to reproduce:

import holidays

os.environ['LANG']=''
de_holidays = holidays.country_holidays("DE")
print(de_holidays.get("2024-12-25"))

os.environ['LANG'] = 'C.UTF-8'
de_holidays = holidays.country_holidays("DE")
print(de_holidays.get("2024-12-25"))

Output Erster Weihnachtstag Christmas Day

Environment

  • I suppose any OS or Python version will have the same behaviour
  • holidays version: 0.62

Additional Context

Add any other context about the problem here.

pmarkoo avatar Dec 12 '24 14:12 pmarkoo

Hi @pmarkoo thanks for filing this!

As far as I remember It was our decision back in 2022 to have English as a fallback.

Even though there is no technical difficulty to change the behavior I doubt we'll do it for v0. However, It makes total sense to revisit the implementation for v1 in my opinion.

When using the country_holidays function without specifying the language parameter (i.e., setting it to None or omitting it), the holiday names should consistently be returned in the country's original language as per the documentation.

Could you add a link to the documentation you mentioned in your post?

Thank you!

arkid15r avatar Dec 18 '24 00:12 arkid15r

Hello @arkid15r, thank you for considering this!

When I mentioned the documentation, I was specifically referring to the docstring of the country_holidays function in the code itself: https://github.com/vacanza/holidays/blob/dev/holidays/utils.py#L72

English as a fallback is not really a problem. My main concern is that reliance on the LANG environment variable when the language parameter is unset is largely unknown unless one digs into the code. This implicit behavior leads to inconsistencies across environments and may confuse users. Sorry if this is just my own ignorance or lack of experience with locale-related engineering.

pmarkoo avatar Dec 19 '24 01:12 pmarkoo

No, this is a valid point. I believe we need to update the docs while keeping English translation as a fallback. I'm open to consider alternative opinions for v1.

arkid15r avatar Dec 20 '24 05:12 arkid15r

In version 0.60 the language does not work as expected in Jupyter .

For example, running the following code in Jupyter I got:

`min_year = 2021 max_year = 2022 country = "PE"

country_code = country years = [min_year, max_year] country_holidays_dict = country_holidays(country_code, years=years, language="en") country_holidays_dict

holidays_data = [ (str(date), name) for date, name in country_holidays_dict.items() ]

holiday_names = {holiday_name: holiday_name for _, holiday_name in holidays_data}

print(holiday_names)`

returns:

{'Año Nuevo': 'Año Nuevo', 'Jueves Santo': 'Jueves Santo', 'Viernes Santo': 'Viernes Santo', 'Domingo de Resurrección': 'Domingo de Resurrección', 'Día del Trabajo': 'Día del Trabajo', 'San Pedro y San Pablo': 'San Pedro y San Pablo', 'Día de la Independencia': 'Día de la Independencia', 'Día de la Gran Parada Militar': 'Día de la Gran Parada Militar', 'Santa Rosa de Lima': 'Santa Rosa de Lima', 'Combate de Angamos': 'Combate de Angamos', 'Todos Los Santos': 'Todos Los Santos', 'Inmaculada Concepción': 'Inmaculada Concepción', 'Navidad del Señor': 'Navidad del Señor', 'Batalla de Junín': 'Batalla de Junín', 'Batalla de Ayacucho': 'Batalla de Ayacucho'}

however if I run the same code in a script from the terminal I got

{"New Year's Day": "New Year's Day", 'Maundy Thursday': 'Maundy Thursday', 'Good Friday': 'Good Friday', 'Easter Sunday': 'Easter Sunday', 'Labor Day': 'Labor Day', 'Saint Peter and Saint Paul': 'Saint Peter and Saint Paul', 'Independence Day': 'Independence Day', 'Great Military Parade Day': 'Great Military Parade Day', 'Rose of Lima Day': 'Rose of Lima Day', 'Battle of Angamos Day': 'Battle of Angamos Day', "All Saints' Day": "All Saints' Day", 'Immaculate Conception Day': 'Immaculate Conception Day', 'Christmas Day': 'Christmas Day', 'Battle of Junín Day': 'Battle of Junín Day', 'Battle of Ayacucho Day': 'Battle of Ayacucho Day'}

changing the language to spanish in the same script I got:

{'Año Nuevo': 'Año Nuevo', 'Jueves Santo': 'Jueves Santo', 'Viernes Santo': 'Viernes Santo', 'Domingo de Resurrección': 'Domingo de Resurrección', 'Día del Trabajo': 'Día del Trabajo', 'San Pedro y San Pablo': 'San Pedro y San Pablo', 'Día de la Independencia': 'Día de la Independencia', 'Día de la Gran Parada Militar': 'Día de la Gran Parada Militar', 'Santa Rosa de Lima': 'Santa Rosa de Lima', 'Combate de Angamos': 'Combate de Angamos', 'Todos Los Santos': 'Todos Los Santos', 'Inmaculada Concepción': 'Inmaculada Concepción', 'Navidad del Señor': 'Navidad del Señor', 'Batalla de Junín': 'Batalla de Junín', 'Batalla de Ayacucho': 'Batalla de Ayacucho'}

So looks like the problem arises just in Jupyter.

running locale in the terminal I got

LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8"

running locale in Jupyter I got:

LANG="" LC_COLLATE="C" LC_CTYPE="UTF-8" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL=

fedemolina avatar Jan 10 '25 02:01 fedemolina

country_holidays_dict = country_holidays(country_code, years=years, language="en")

Correct language value is en_US.

KJhellico avatar Jan 10 '25 12:01 KJhellico

country_holidays_dict = country_holidays(country_code, years=years, language="en")

Correct language value is en_US.

Now it works as expected.

But documentation said:

:param language: The language which the returned holiday names will be translated into. It must be an ISO 639-1 (2-letter) language code. If the language translation is not supported the original holiday names will be used.

fedemolina avatar Jan 10 '25 13:01 fedemolina