Inconsistent localization in country_holidays due to LANG dependency
Bug Report
Expected Behavior
When using the country_holidays function without specifying the language parameter (i.e., setting it to None or omitting it), the holiday names should consistently be returned in the country's original language as per the documentation.
For example, executing the following code:
import holidays
de_holidays = holidays.country_holidays("DE")
print(de_holidays.get("2024-12-25"))
Should consistently output:
Erster Weihnachtstag
Actual Behavior
The country_holidays function exhibits inconsistent behavior based on the environment's LANG environment variable when the language parameter is not set:
Local Environment:
LANGUAGE: None
LC_ALL: None
LC_MESSAGES: None
LANG: None
Output:
Erster Weihnachtstag
Remote Server Environment:
LANGUAGE: None
LC_ALL: None
LC_MESSAGES: None
LANG: C.UTF-8
Output:
Christmas Day
Steps to Reproduce the Problem
Easy to reproduce:
import holidays
os.environ['LANG']=''
de_holidays = holidays.country_holidays("DE")
print(de_holidays.get("2024-12-25"))
os.environ['LANG'] = 'C.UTF-8'
de_holidays = holidays.country_holidays("DE")
print(de_holidays.get("2024-12-25"))
Output
Erster Weihnachtstag
Christmas Day
Environment
- I suppose any OS or Python version will have the same behaviour
- holidays version:
0.62
Additional Context
Add any other context about the problem here.
Hi @pmarkoo thanks for filing this!
As far as I remember It was our decision back in 2022 to have English as a fallback.
Even though there is no technical difficulty to change the behavior I doubt we'll do it for v0.
However, It makes total sense to revisit the implementation for v1 in my opinion.
When using the country_holidays function without specifying the language parameter (i.e., setting it to None or omitting it), the holiday names should consistently be returned in the country's original language as per the documentation.
Could you add a link to the documentation you mentioned in your post?
Thank you!
Hello @arkid15r, thank you for considering this!
When I mentioned the documentation, I was specifically referring to the docstring of the country_holidays function in the code itself: https://github.com/vacanza/holidays/blob/dev/holidays/utils.py#L72
English as a fallback is not really a problem. My main concern is that reliance on the LANG environment variable when the language parameter is unset is largely unknown unless one digs into the code. This implicit behavior leads to inconsistencies across environments and may confuse users. Sorry if this is just my own ignorance or lack of experience with locale-related engineering.
No, this is a valid point. I believe we need to update the docs while keeping English translation as a fallback.
I'm open to consider alternative opinions for v1.
In version 0.60 the language does not work as expected in Jupyter .
For example, running the following code in Jupyter I got:
`min_year = 2021 max_year = 2022 country = "PE"
country_code = country years = [min_year, max_year] country_holidays_dict = country_holidays(country_code, years=years, language="en") country_holidays_dict
holidays_data = [ (str(date), name) for date, name in country_holidays_dict.items() ]
holiday_names = {holiday_name: holiday_name for _, holiday_name in holidays_data}
print(holiday_names)`
returns:
{'Año Nuevo': 'Año Nuevo', 'Jueves Santo': 'Jueves Santo', 'Viernes Santo': 'Viernes Santo', 'Domingo de Resurrección': 'Domingo de Resurrección', 'Día del Trabajo': 'Día del Trabajo', 'San Pedro y San Pablo': 'San Pedro y San Pablo', 'Día de la Independencia': 'Día de la Independencia', 'Día de la Gran Parada Militar': 'Día de la Gran Parada Militar', 'Santa Rosa de Lima': 'Santa Rosa de Lima', 'Combate de Angamos': 'Combate de Angamos', 'Todos Los Santos': 'Todos Los Santos', 'Inmaculada Concepción': 'Inmaculada Concepción', 'Navidad del Señor': 'Navidad del Señor', 'Batalla de Junín': 'Batalla de Junín', 'Batalla de Ayacucho': 'Batalla de Ayacucho'}
however if I run the same code in a script from the terminal I got
{"New Year's Day": "New Year's Day", 'Maundy Thursday': 'Maundy Thursday', 'Good Friday': 'Good Friday', 'Easter Sunday': 'Easter Sunday', 'Labor Day': 'Labor Day', 'Saint Peter and Saint Paul': 'Saint Peter and Saint Paul', 'Independence Day': 'Independence Day', 'Great Military Parade Day': 'Great Military Parade Day', 'Rose of Lima Day': 'Rose of Lima Day', 'Battle of Angamos Day': 'Battle of Angamos Day', "All Saints' Day": "All Saints' Day", 'Immaculate Conception Day': 'Immaculate Conception Day', 'Christmas Day': 'Christmas Day', 'Battle of Junín Day': 'Battle of Junín Day', 'Battle of Ayacucho Day': 'Battle of Ayacucho Day'}
changing the language to spanish in the same script I got:
{'Año Nuevo': 'Año Nuevo', 'Jueves Santo': 'Jueves Santo', 'Viernes Santo': 'Viernes Santo', 'Domingo de Resurrección': 'Domingo de Resurrección', 'Día del Trabajo': 'Día del Trabajo', 'San Pedro y San Pablo': 'San Pedro y San Pablo', 'Día de la Independencia': 'Día de la Independencia', 'Día de la Gran Parada Militar': 'Día de la Gran Parada Militar', 'Santa Rosa de Lima': 'Santa Rosa de Lima', 'Combate de Angamos': 'Combate de Angamos', 'Todos Los Santos': 'Todos Los Santos', 'Inmaculada Concepción': 'Inmaculada Concepción', 'Navidad del Señor': 'Navidad del Señor', 'Batalla de Junín': 'Batalla de Junín', 'Batalla de Ayacucho': 'Batalla de Ayacucho'}
So looks like the problem arises just in Jupyter.
running locale in the terminal I got
LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8"
running locale in Jupyter I got:
LANG="" LC_COLLATE="C" LC_CTYPE="UTF-8" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL=
country_holidays_dict = country_holidays(country_code, years=years, language="en")
Correct language value is en_US.
country_holidays_dict = country_holidays(country_code, years=years, language="en")
Correct
languagevalue isen_US.
Now it works as expected.
But documentation said:
:param language: The language which the returned holiday names will be translated into. It must be an ISO 639-1 (2-letter) language code. If the language translation is not supported the original holiday names will be used.