yFinance download API works differently on Linux vs Windows system
Describe bug
Issue Description
I encountered an issue where the yf.download API behaves differently on Linux and Windows systems. Specifically, when using the yf.download API with the period set to max, the API returns a proper DataFrame with the expected output on Windows without any errors, as shown below:
[*********************100%**********************] 1 of 1 completed
Open High Low Close Adj Close Volume
Date
2024-08-05 174.445999 181.694901 174.353607 178.951599 178.951599 0
The response is similar even with a random start date.
However, when I run the exact same code on an Amazon EC2 Linux instance, the download API throws the following error:
YFInvalidPeriodError("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")
This discrepancy suggests that the yf.download API behaves differently on Linux compared to Windows.
Steps to Reproduce
- Use the yf.download API with the period set to max on a Windows machine.
- Observe that the API returns the expected DataFrame without errors.
- Run the same code on an Amazon EC2 Linux instance. *Observe the YFInvalidPeriodError error being thrown.
Example ticker: ^XND
Expected Behavior
The yf.download API should return a DataFrame with the expected data without throwing errors, regardless of the operating system.
Actual Behavior
- Windows: API works as expected, returns a DataFrame with the data.
- Linux: API throws a YFInvalidPeriodError when the period is set to max.
Environment Details
- Windows Machine:
- OS: Windows 11
- yFinance Version: 0.2.41
- Python Version: 3.11
- Amazon EC2 Linux Instance:
- OS: Amazon Linux 2023
- yFinance Version: 0.2.41
- Python Version: 3.11
Additional Information
The issue persists even when a random start date is provided.
This behavior suggests a potential discrepancy in the yFinance API implementation or configuration for different operating systems.
Request for Insight
Any insight into why this discrepancy occurs and how to resolve it would be very helpful. Is there a known issue with yFinance on Linux systems, or is there a workaround to make the behavior consistent across different operating systems?
Thank you for your assistance.
Simple code that reproduces your problem
Code
from yfinance import download
download(tickers=['^XND'], period='max')
Result on Windows
[*********************100%**********************] 1 of 1 completed
Open High Low Close Adj Close Volume
Date
2024-08-05 174.445999 181.694901 174.353607 178.951599 178.951599 0
Result on Linux
[*********************100%%**********************] 1 of 1 completed
1 Failed download:
['^XND']: YFInvalidPeriodError("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []
But when this same query was implemented using a 1d or 5d period it worked as expected.
Debug log
[*********************100%%**********************] 1 of 1 completed
1 Failed download:
['^XND']: YFInvalidPeriodError("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []
Bad data proof
No response
yfinance version
0.2.41
Python version
3.11
Operating system
Windows 11, Amazon Linux 2023
That's not the debug log.
One thing you can do is try running the code in wsl ( windows subsystem for Linux) , and see if the error is still there.
I can confirm that the error also occurs on windows subsystem for Linux (Windows 10)
This has to do with pytz not being able to handle year data past year 2038. When you use max it adds 99 years to the current date, which goes past year 2048. https://github.com/stub42/pytz/issues/31
if start or period is None or period.lower() == "max":
# Check can get TZ. Fail => probably delisted
tz = self.tz
if tz is None:
# Every valid ticker has a timezone. A missing timezone is a problem.
_exception = YFTzMissingError(self.ticker)
err_msg = str(_exception)
shared._DFS[self.ticker] = utils.empty_df()
shared._ERRORS[self.ticker] = err_msg.split(': ', 1)[1]
if raise_errors:
raise _exception
else:
logger.error(err_msg)
return utils.empty_df()
if end is None:
end = int(_time.time())
else:
end = utils._parse_user_dt(end, tz)
if start is None:
if interval == "1m":
start = end - 604800 # 7 days
elif interval in ("5m", "15m", "30m", "90m"):
start = end - 5184000 # 60 days
elif interval in ("1h", '60m'):
start = end - 63072000 # 730 days
else:
start = end - 3122064000 # 99 years
else:
start = utils._parse_user_dt(start, tz)
params = {"period1": start, "period2": end}
else:
period = period.lower()
params = {"range": period}
I found that when running the following code on EC2:
stockNames = ['A', 'AAA', 'AAPL', 'NVDA', 'CNQ', 'SNA', 'META'] for stockName in stockNames: Ticker = yf.Ticker(stockName) # Get dividend and split information actions_data = Ticker.actions
I only get data from 2022 and earlier, and cannot obtain the latest data. However, the code works fine and retrieves the latest data when run on a local Windows machine.