yfinance icon indicating copy to clipboard operation
yfinance copied to clipboard

Known Yahoo rate limiter?

Open ganymedenet opened this issue 2 years ago • 6 comments

Hello,

I try to scan Ticker.quarterly_income_stmt income statements using multithreading. After a few mins I start receiving empty DataFrames PLUS yahoo website shows the following issue (see the screenshot below).

I there any known IP-based Yahoo rate limiter?

image

ganymedenet avatar Jan 28 '23 22:01 ganymedenet

Yes https://github.com/ranaroussi/yfinance/issues?q=is%3Aissue+is%3Aopen+label%3A%22Yahoo+spam%22

ValueRaider avatar Jan 28 '23 22:01 ValueRaider

I guess that proxy can help?

ganymedenet avatar Jan 28 '23 22:01 ganymedenet

No idea. I respect Yahoo's limit, with caching and self-rate-limiting (requests_cache and requests_ratelimiter)

ValueRaider avatar Jan 28 '23 22:01 ValueRaider

Ok, will try proxy tomorrow. Can update the issue later

ganymedenet avatar Jan 28 '23 22:01 ganymedenet

Just double checked - I was rate limited after about 40 calls for now. I think depending on the number of spam cases/intensity the decrease the rate limiter threshold dynamically

ganymedenet avatar Jan 28 '23 23:01 ganymedenet

proxy is broken currently, a fix is proposed just requires someone to test and confirm. It's in branch hotfix/proxy, #1080 explains how to test. Report verdict in #1371

ValueRaider avatar Jan 28 '23 23:01 ValueRaider

hotfix/proxy

it seems to work, however how can I use proxy to query the following?

ticker.income_stmt
ticker.quarterly_income_stmt

ganymedenet avatar Feb 02 '23 13:02 ganymedenet

As you use a proxy, I have a question (it's very relevant to yours).

First, session recap - pass one into Ticker constructor and then all methods will use the session automatically (e.g. requests_cache).

Question: does it make sense that proxy is NOT handled like this? Instead, each get() method has a proxy argument e.g. Ticker.get_cash_flow(proxy=...). Is there any good reason for a user to want this behaviour?

ValueRaider avatar Feb 02 '23 13:02 ValueRaider

Thank you for the session input, however Yahoo is still ratelimiting me after about 50 calls, I think the session doesn't work properly

        session = requests_cache.CachedSession('./.tmp/yfinance.cache')
        session.headers = headers
        session.proxies = {
            "http": proxy
        }

        data = yf.Ticker(stock, session=session)

In terms of your question, I think the both options work:

  • set a session with the proxy list when initialize Ticker(). Rotating proxies can solve the issue of repeating proxy for each particular get() call
  • allow customer to provide proxy for each get call separately

ganymedenet avatar Feb 02 '23 14:02 ganymedenet

I think the session doesn't work properly

The requests_cache session works perfectly well, does what it says - cache requests. No mention of rate-limiting, for that you need a rate-limiter. I only mentioned session to frame my question.

allow customer to provide proxy for each get call separately

Why? This was my question. session isn't handled like this.

ValueRaider avatar Feb 02 '23 14:02 ValueRaider

requests_cache works perfectly, probably this is the proxy issue with the library. When I create a new session for the Ticker() call it seems that yfinance doesn't use session IP as desired, because Yahoo is still rate limiting me.

In terms of your question, I agree that the proxy can be specified at the session level. However, when someone wants to make a number of data calls within the same session it can trigger the rate limiter again. Probably, that’s why some people want to use a new IP for each call. I manage it by using a rotating proxy.

So, I suspect that there is still an issue related to proxy use (proxy provided within the CachedSession()).

ganymedenet avatar Feb 02 '23 15:02 ganymedenet

requests does not rotate proxies, you have to do this yourself manually. Probably why you are triggering rate-limiting. You can create a simple requests wrapper class in 5-10 minutes that rotates proxies on each get().

ValueRaider avatar Feb 02 '23 15:02 ValueRaider

my proxy is rotating by itself. each call made via this proxy has a unique IP

ganymedenet avatar Feb 02 '23 16:02 ganymedenet

one sec. let me test something.

ganymedenet avatar Feb 02 '23 16:02 ganymedenet

Never mind your rate-limiting, let's return to this question:

it seems to work, however how can I use proxy to query the following?

ticker.income_stmt
ticker.quarterly_income_stmt

If you are using a rotating proxy then you don't have to provide proxy via get(), instead pass via the session. I think can close issue.

ValueRaider avatar Feb 02 '23 16:02 ValueRaider

ok, eventually I can confirm that proxy works! the only issue is that the library uses a huge amount of traffic, so if you want to use paid proxy be ready to pay a lot.

I think the issue can be closed hotfix/proxy works well

ganymedenet avatar Feb 02 '23 20:02 ganymedenet

mmmmm maybe they use JA3 or something

iukea1 avatar Mar 10 '23 13:03 iukea1

No idea. I respect Yahoo's limit, with caching and self-rate-limiting (requests_cache and requests_ratelimiter)

I'm getting my data filled with NaNs and would like to comply with their rate limiting. Is there a way to do this with the yf.download call or do I have to create tickers for each symbol I am pulling down? I don't see a way to pass the session to the download call.

jkant avatar May 21 '23 07:05 jkant

I don't see a way to pass the session to the download call.

That should be fixed, please create a new Issue.

ValueRaider avatar May 21 '23 16:05 ValueRaider

@ValueRaider Done! https://github.com/ranaroussi/yfinance/issues/1534

jkant avatar May 21 '23 19:05 jkant