yfinance
yfinance copied to clipboard
Known Yahoo rate limiter?
Hello,
I try to scan Ticker.quarterly_income_stmt income statements using multithreading. After a few mins I start receiving empty DataFrames PLUS yahoo website shows the following issue (see the screenshot below).
I there any known IP-based Yahoo rate limiter?
Yes https://github.com/ranaroussi/yfinance/issues?q=is%3Aissue+is%3Aopen+label%3A%22Yahoo+spam%22
I guess that proxy can help?
No idea. I respect Yahoo's limit, with caching and self-rate-limiting (requests_cache
and requests_ratelimiter
)
Ok, will try proxy tomorrow. Can update the issue later
Just double checked - I was rate limited after about 40 calls for now. I think depending on the number of spam cases/intensity the decrease the rate limiter threshold dynamically
proxy is broken currently, a fix is proposed just requires someone to test and confirm. It's in branch hotfix/proxy
, #1080 explains how to test. Report verdict in #1371
hotfix/proxy
it seems to work, however how can I use proxy to query the following?
ticker.income_stmt
ticker.quarterly_income_stmt
As you use a proxy, I have a question (it's very relevant to yours).
First, session
recap - pass one into Ticker
constructor and then all methods will use the session
automatically (e.g. requests_cache
).
Question: does it make sense that proxy is NOT handled like this? Instead, each get()
method has a proxy argument e.g. Ticker.get_cash_flow(proxy=...)
. Is there any good reason for a user to want this behaviour?
Thank you for the session
input, however Yahoo is still ratelimiting me after about 50 calls, I think the session doesn't work properly
session = requests_cache.CachedSession('./.tmp/yfinance.cache')
session.headers = headers
session.proxies = {
"http": proxy
}
data = yf.Ticker(stock, session=session)
In terms of your question, I think the both options work:
- set a session with the proxy list when initialize Ticker(). Rotating proxies can solve the issue of repeating proxy for each particular
get()
call - allow customer to provide proxy for each
get
call separately
I think the session doesn't work properly
The requests_cache
session works perfectly well, does what it says - cache requests. No mention of rate-limiting, for that you need a rate-limiter. I only mentioned session to frame my question.
allow customer to provide proxy for each get call separately
Why? This was my question. session isn't handled like this.
requests_cache
works perfectly, probably this is the proxy issue with the library. When I create a new session for the Ticker()
call it seems that yfinance
doesn't use session IP as desired, because Yahoo is still rate limiting me.
In terms of your question, I agree that the proxy can be specified at the session level. However, when someone wants to make a number of data calls within the same session it can trigger the rate limiter again. Probably, that’s why some people want to use a new IP for each call. I manage it by using a rotating proxy.
So, I suspect that there is still an issue related to proxy use (proxy provided within the CachedSession()
).
requests
does not rotate proxies, you have to do this yourself manually. Probably why you are triggering rate-limiting. You can create a simple requests
wrapper class in 5-10 minutes that rotates proxies on each get()
.
my proxy is rotating by itself. each call made via this proxy has a unique IP
one sec. let me test something.
Never mind your rate-limiting, let's return to this question:
it seems to work, however how can I use proxy to query the following?
ticker.income_stmt
ticker.quarterly_income_stmt
If you are using a rotating proxy then you don't have to provide proxy via get()
, instead pass via the session
. I think can close issue.
ok, eventually I can confirm that proxy works! the only issue is that the library uses a huge amount of traffic, so if you want to use paid proxy be ready to pay a lot.
I think the issue can be closed hotfix/proxy
works well
mmmmm maybe they use JA3 or something
No idea. I respect Yahoo's limit, with caching and self-rate-limiting (
requests_cache
andrequests_ratelimiter
)
I'm getting my data filled with NaNs and would like to comply with their rate limiting. Is there a way to do this with the yf.download call or do I have to create tickers for each symbol I am pulling down? I don't see a way to pass the session to the download call.
I don't see a way to pass the session to the download call.
That should be fixed, please create a new Issue.
@ValueRaider Done! https://github.com/ranaroussi/yfinance/issues/1534