twitter-scraper icon indicating copy to clipboard operation
twitter-scraper copied to clipboard

Let the user choose how to handle rate limits

Open catdevnull opened this issue 1 year ago • 3 comments

Hi! I've been liking this library a lot as you can see :)

Something that I would like to have for my project is being able to choose how rate limits are handled. Specifically, I want to implement a behavior similar to the one in twscrape where if an account has a rate limit, it re-tries with another account that doesn't. However, because of how twitter-scraper currently works it's not possible to choose, and it just waits until the rate limit is reset which can take a long time.

I'm probably going to do the easiest patch for my fork which is going to be throwing a special error (something like RateLimitError) from requestApi to be able to handle it from my code, but I would like to find a solution that can be applied upstream.

Thanks <3

catdevnull avatar Apr 08 '24 19:04 catdevnull

Definitely open to this in general, though I'm a little less sure about how to actually go about it (still need to think it over).

I'm leaning towards making it part of the TwitterAuth interface and generalizing it a bit more, since that's already responsible for providing the fetch implementation, along with request/response transforms. From an implementation standpoint, it could just be a wrapper around the fetch implementation itself. On the other hand, there's not a clear way of communicating that this is where the functionality would live, so it could use a bit more of an API around it, maybe.

At any rate, the fetch provider API could actually work today for your use case, so that might be worth a try in the meantime 👀

karashiiro avatar Apr 09 '24 03:04 karashiiro

Okay, I've implemented a PoC of account login as a request interceptor in this branch: https://github.com/catdevnull/milei-twitter/blob/scraper-resiliente-cuentas-interceptar/scraper-manzana/scraper.ts

It's a bit hacky and it doesn't actually switch accounts on errors right now, but it could be easily implemented (EDIT: now implemented :). I'll probably implement this in my staging environment soon.

It would probably be cleaner to use if we could access the cookie jar from outside the library

catdevnull avatar Apr 17 '24 17:04 catdevnull

(ignore) referencing #84 for discoverability

karashiiro avatar Apr 28 '24 00:04 karashiiro