bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

[BUG] Redgifs 429 Rate Limiting on Temporary Token Auth API

Open altdc opened this issue 5 months ago • 29 comments

  • [x] I am reporting a bug.
  • [x] I am running the latest version of BDfR
  • [x] I have read the Opening an issue

Description

Redgifs appears to have clamped down their rate limits for their auth API.

I used to be able to download dozens of profiles back to back with no issues. Sometime in the past few weeks/months, the rate limit for this API appears to have changed. And now I hit 429s before finishing a single profile.

Looking at the code: it appears bdfr is fetching a new token before every request.

https://github.com/aliparlakci/bulk-downloader-for-reddit/blob/8c293a46843c818bea2c2013db38191867993a14/bdfr/site_downloaders/redgifs.py

I think this could benefit from one of the following solutions:

  1. Simple throttling mechanism. Wait a certain amount of time before requesting a new one.
  2. Cache token and re-use it until it expires or you encounter a 401/403 error.
  3. Allow for redgifs authentication to support higher rate limits.

The docs for redgifs say these temp tokens have a short expiration. But even after requesting a token myself, it was hard to tell exactly how short. The decoded token showed an exp time that seemed to match the current time. Maybe that means they are only 1 time use?

https://github.com/Redgifs/api/wiki/Temporary-tokens

Which of these solutions seem viable or would most likely be accepted as a PR? I possibly could help with a fix for this. But I can't commit to how soon.

Command

cat unsaved.txt | xargs -L 1 | xargs -I {}  bdfr download ./users/{} --user {} --log ./run.log --opts bdfr.yaml
sort: top
config: "config.cfg"
authenticate: true
no_dupes: true
search_existing: true
verbose: true
file_scheme: "{UPVOTES}_{REDDITOR}_{TITLE}_{POSTID}"
filename_restriction_scheme: "windows"
submitted: true
log: "latest_run.log"
disable_module:
  - SelfPost
skip_domain:
  - api.imgur.com
  - imgur.com
  - i.imgur.com

Environment (please complete the following information)

  • OS: MacOS 14.3
  • Python version: 3.9.6

Logs

[2024-01-29 03:58:01,564 - bdfr.downloader - DEBUG] - Attempting to download submission 17qbc5d
[2024-01-29 03:58:01,564 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/silverhoarsecygnet
[2024-01-29 03:58:04,267 - bdfr.file_name_formatter - DEBUG] - Forcing Windows-compatible filenames
[2024-01-29 03:58:04,267 - bdfr.file_name_formatter - DEBUG] - Forcing Windows-compatible filenames
[2024-01-29 03:58:06,198 - bdfr.downloader - DEBUG] - Written file to /Volumes/vault/bdfr/users/angrytoban/amateurgirlsbigcocks/185_angrytoban_Asian tinder date said she never rode a big cock before_17qbc5d.mp4
[2024-01-29 03:58:06,201 - bdfr.downloader - DEBUG] - Hash added to master list: 419fa74540840633943f606016e890aa
[2024-01-29 03:58:06,201 - bdfr.downloader - INFO] - Downloaded submission 17qbc5d from amateurgirlsbigcocks
[2024-01-29 03:58:06,204 - bdfr.downloader - DEBUG] - Attempting to download submission 19be5yy
[2024-01-29 03:58:06,204 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/lavenderjumboafricangoldencat
[2024-01-29 03:58:06,256 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 19be5yy: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary
[2024-01-29 03:58:06,256 - bdfr.downloader - DEBUG] - Attempting to download submission 18ejhit
[2024-01-29 03:58:06,256 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/gianttepidnarwhal
[2024-01-29 03:58:06,307 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 18ejhit: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary
[2024-01-29 03:58:06,307 - bdfr.downloader - DEBUG] - Attempting to download submission 17rbfkn
[2024-01-29 03:58:06,307 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/unluckyhoneydewsnowmonkey
[2024-01-29 03:58:06,359 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 17rbfkn: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary
[2024-01-29 03:58:06,359 - bdfr.downloader - DEBUG] - Attempting to download submission 17fbf49
[2024-01-29 03:58:06,359 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/lightskybluesquarechafer
[2024-01-29 03:58:06,411 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 17fbf49: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary

altdc avatar Jan 29 '24 09:01 altdc

Thanks for calling this out. I am seeing the same issue.

twentyonerooms87 avatar Jan 31 '24 18:01 twentyonerooms87

same issue

jameswebb07 avatar Feb 02 '24 10:02 jameswebb07

Any solution anyone ?

jameswebb07 avatar Feb 04 '24 17:02 jameswebb07

@jameswebb07

I have the temporary token caching working, I'm just working on the error handling now.

  • if the temp file does not exist
  • if the temp file exists and is not correct
  • if the temp file exists and is correct.

I'm throwing a 429 error handler in there too for good measure.

Getting a bit hung up on some basic dorky python things because I've been up all night. Hopefully shouldn't take too much longer (famous last words).


Fun fact, according to the API documentation on temporary tokens, we should be using them for 24 hours. The current script attempts to get the token every time it tries to get a link. Can't blame them for adjusting the API rate limiting because of that. lol.

remghoost avatar Feb 04 '24 19:02 remghoost

Thanks @remghoost! The recent PRs have been merged into the development branch because the repo owner is still unreachable (@aliparlakci ). @Serene-Arc appears to be the one maintaining the development branch.

It looks like Redgifs updated that API documentation this week to clarify the token expiry was ~24 hours. It wasn't there when I checked before opening this issue. That makes me think this is a very recent change.

I messed up when attempting to test your changes and tested against master instead and got myself rate limited again. I will confirm this fixes the issue on my machine soon when my rate limit expires.

altdc avatar Feb 05 '24 05:02 altdc

@altdc Glad to help!

I did notice that the last changes to the master branch were from about a year ago. I could git clone the development branch and apply the changes to that instead if it would make merging easier.

And with regards to the original owner disappearing, would it be worth forking off entirely and making a bdfr2....? I'm not entirely sure how updating a pip package works. I've got my pip install -e . version though and that works fine for me.

It would be good to move the general userbase off of requesting tokens at such an alarming rate though... Larger companies don't need more reasons to lock down their API (looking at you, Reddit/Twitter).

Heck, now I'm curious if that's happening on all of the site_downloaders....

...updated that API documentation this week...

Ahh. That explains it. I'm curious why the sudden changes on their part. I mean, it could be the users of this script, but I doubt we'd be enough to really put a dent in their API usage... Though, some larger reddit accounts might have upwards of 300+ posts on redgifs. Maybe it was us. Who knows....

...and got myself rate limited again.

If you have your cached temporary token, sending a request using that token resets your rate limiting (even when testing requests for a new token via the API). I unfortunately found this out later than sooner. Would've saved me a heck of a lot of time early on. haha.

remghoost avatar Feb 05 '24 09:02 remghoost

Thanks @remghoost! The recent PRs have been merged into the development branch because the repo owner is still unreachable (@aliparlakci ). @Serene-Arc appears to be the one maintaining the development branch.

It looks like Redgifs updated that API documentation this week to clarify the token expiry was ~24 hours. It wasn't there when I checked before opening this issue. That makes me think this is a very recent change.

I messed up when attempting to test your changes and tested against master instead and got myself rate limited again. I will confirm this fixes the issue on my machine soon when my rate limit expires.

Anyone know how long the Redgifs rate limit lasts for? I am in the same boat. 403 for all my requests from all of my computers. Not sure if this is redgifs blocking or rate limiting me:

[2024-02-07 09:54:34,807 - bdfr.downloader - DEBUG] - Attempting to download submission 18og8v0 [2024-02-07 09:54:34,822 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/everyfrankstarfish

-=-=-=-=-=-=-=-=-=-=-=- Redgifs API token file not found, retrieving new token Attempting to retrieve new temporary Redgifs API token [2024-02-07 09:54:34,919 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 18og8v0: Failed to retrieve Redgifs API token: Server responded with 403 to https://api.redgifs.com/v2/auth/temporary

twentyonerooms87 avatar Feb 07 '24 22:02 twentyonerooms87

@twentyonerooms87 I did not see this behavior. Maybe it has changed in the past few days? It's weird that they would give you a 403 response for an endpoint that does not support authentication. Maybe they have blocked your IP with something more severe than a temporary rate limit. I will re-check from my end soon.

altdc avatar Feb 08 '24 16:02 altdc

@twentyonerooms87 I did not see this behavior. Maybe it has changed in the past few days? It's weird that they would give you a 403 response for an endpoint that does not support authentication. Maybe they have blocked your IP with something more severe than a temporary rate limit. I will re-check from my end soon.

I think my IP is blocked. See below. Will submit a request and will hopefully be unblocked.

https://github.com/Redgifs/api/wiki/Blocked

twentyonerooms87 avatar Feb 08 '24 17:02 twentyonerooms87

Not sure if it is related but i download gonewild, worked fine until today. I did format with a new mobo and cpu ram and installed bdfr with pipx instead of pip.

I am no longer able to finish a run.

How do i reduce bdfrs speed?

madhatr avatar Feb 08 '24 18:02 madhatr

@twentyonerooms87

Hmm. Well, according to the temporary tokens wiki:

There is a strict limit on how many guest tokens you can request per hour. Violating that limit will result in your API access blocked for a long time.

You might fall into the "blocked for a long time" section (because of how the prior code requested auth tokens on every link). I believe an appeal is only way to deal with that....?


@altdc

We might want to look into client tokens or even just full user tokens.

Heck, I even found a redgifs pip package, and it has its own API calling functions. Might be better to switch over to that entirely....?

Not sure. Will do some more research.


It also might be worth reaching out to redgifs and finding out what their preferred method of this whole thing is (to try and prevent people from being banned from API access).

It's a tricky situation, because someone using the base bdfr package from pip will get rate limited (or even banned) from the Redgifs API. 429 and 401 errors can be fixed (by using the change in my pull request) but we can't fix 403 errors.

Should we publish our own pip package and readjust the README.MD to point to that....? Since the original repo owner went dark a while back. Though, @Serene-Arc seems to be a maintainer on the pip package, so perhaps we could update the package and prevent more people from being banned....?

My pull request merges fine with the master branch and shouldn't break any of the other functionality. Would prevent future 403 bans, but wouldn't quite help anyone running an outdated package.

Hmm. Not entirely sure on the process though. I've never published/updated a pip package.

remghoost avatar Feb 08 '24 22:02 remghoost

Hi all, thanks for the work on this. I'll review the PR for this issue in a couple hours, there are some things that need to be changed before it's merged.

The information on Ali's absence and the trouble it's caused is spread over a couple of issues so I'll recap it here. I am listed as the maintainer here and on pypi but I do not have administrative privileges on either the pypi package or this repository. That is the cause of the problem. With the advent of Reddit rate limiting, our tests have started to fail if done with the default client token and secret, because they are rate-limited. These tokens are repository secrets, which I cannot access or change.

The master branch for this repository is protected, which means that even I cannot merge anything to it with failing tests. I can't change the tokens, which means the tests will always fail. Thus, I can't merge anything to master.

The pypi package is automatically updated through a workflow when we make a new release, which is done from master. I don't have administrative access to that either, so I can't change anything regarding it. Again, that is done through an API secret I don't have access to.

I've reached out to Ali three times since last August when these issues started, most recently January 16, but obviously he hasn't responded to fix the issue. We also briefly discussed transferring the respository to me, since he doesn't actively develop it anymore. I wanted to transfer it to an organisation to stop these exact types of problems in the future, but it never happened.

That's the state of things. Going forward I'm not entirely sure what to do. If this issue is as severe as you say, then the nuclear option is to cheat the systems. I remove all tests from the codebase bar one that trivially succeeds, merge and release that, then reintroduce the tests for the development branch. That's the best option that succeeds with the authority I have.

Trying to migrate to another repo, such as mine, would be a huge discontinuity. I doubt most people would switch.

Serene-Arc avatar Feb 09 '24 01:02 Serene-Arc

@Serene-Arc

Thanks for the update and information!

I am listed as the maintainer here and on pypi but I do not have administrative privileges on either the pypi package or this repository.

The pypi package is automatically updated through a workflow...

Well, those two things definitely complicate matters a bit...

...then the nuclear option is to cheat the systems.

Haha. What a roundabout way to push changes. Remove all of the tests except one that would pass indefinitely. It's definitely an option, but I don't really see any other way to do it other than that....


More and more users will probably be coming over here to check why they're getting 401 403 and 429 errors, so it's probably best to implement these changes sooner than later.

Not to mention that Redgifs might start getting peeved at all of the failing API requests coming their way. It could cause them to lock down their API even further. Or force more users to jump through the hoop of submitting a request to get their IP unbanned.


But this is definitely something that should be pondered a bit more before making a decision though.

Gaining ownership over the repo would be the best case scenario, but it seems like that's not quite possible. Not calling any flak to the repo owner (heck, I get bored with projects too).

So, if I understand you correctly:

  • You would remove all of the tests on the master branch
  • Push the changes from the development branch
  • Then re-implement the tests from the development branch (which would pass future pull requests)...?

It would still limit us to only pushing changes that could be merged automatically though, correct?

But it would at least let us get the Redgifs issue fixed in the meantime....

Also, I can adjust my pull request's changes to work on the development branch if you'd like. I was working off of the master branch (before I knew about all of the ownership issues).

remghoost avatar Feb 09 '24 02:02 remghoost

@remghoost It's up to you, there are some things that your PR must change before I would merge it. It's up to you whether I make those changes or make comments for you to change it.

The nuclear option I presented would forbid further changes to master once the tests are reimplemented, but the project development takes place exclusively on the development branch, so nothing would really change. The master branch is exclusively for releases, nothing else.

Serene-Arc avatar Feb 09 '24 03:02 Serene-Arc

I submitted a request via this Redgifs page to unblock my IP. I kept things very generic - used my web browser and some github applications to DL redgifs files:

https://github.com/Redgifs/api/wiki/Blocked

Received this response. I am holding off on submitting the form. Wanted to know folks thoughts:

Thanks for your email. Upon reviewing the notes above, we have confirmed that this is an expected behavior; to continue the access/usage of our content, you may request a formal API access request through this link: https://docs.google.com/forms/d/e/1FAIpQLSf-jnx_BA_y_I3cASpSDrh8dxS18a0r56Bp1T5fTNj7JI4b2g/viewform?hl=en.

You can also find all the necessary information here: https://github.com/Redgifs/api/wiki/API-access

twentyonerooms87 avatar Feb 09 '24 16:02 twentyonerooms87

"I've reached out to Ali three times since last August when these issues started, most recently January 16, but obviously he hasn't responded to fix the issue."

you have your answer right there, bdfr needs someone that loves the project and wants to continue it.

Devicetron avatar Feb 14 '24 00:02 Devicetron

He decided that he didn't want to continue the project some time ago, which is why I am the maintainer now. Unfortunately we never transferred full control as he, understandably, still wanted some input and to have the repository under his name. We discussed transferring it to an organisational account but it hasn't happened. I wanted to do that in case of this exact scenario but I can't do it on my own authority.

Serene-Arc avatar Feb 14 '24 05:02 Serene-Arc

Yeah, that just happens with life sometimes. I've dropped projects out of the blue too. No harm/foul on their part.

If you can edit the README.MD, it might be worth:

  • forking the project
  • making a new python library for it on pypi
  • and mentioning in the README.MD that the project is sort of stalled because of those reasons.

Link to the fork and the new package for people running into the 429 errors.

Name it bdfr2 or something like that.


Development could still take place here on the development branch, but changes worth pushing could be pulled over to the fork (and updating the new pip package at the same time).

Not the most elegant solution, but it should work. It would allow all of the development to stay here as well, which would be good for longevity.

remghoost avatar Feb 14 '24 06:02 remghoost

I'll make new efforts to reach out to @aliparlakci too. My hope is that we can get it transferred to an organisation, lest we lose a ton of users, which we inevitably will if we have to switch that much.

Serene-Arc avatar Feb 14 '24 12:02 Serene-Arc

reinstalled recently and now for some reason pip install -U git+https://github.com/aliparlakci/bulk-downloader-for-reddit.git@development wont pull the development release anymore

git says the repo doesnt exist, but it obviously does.....

remote: Repository not found. fatal: repository 'https://github.com/aliparlakci/bulk-downloader-for-reddit.git@development/' not found

Nevermind, I found the problem.... had an extra space in a bad place

MachiavelliSeraphim avatar Feb 14 '24 17:02 MachiavelliSeraphim

and now its giving me

[2024-02-14 12:57:16,597 - bdfr.connector - WARNING] - Using an unauthenticated app like this will result in Reddit limiting queries to 10 requests a minute

when im using valid credentials >.<

Curiosity, Is there a context error on my command?

python3.9 -m bdfr download /ZFS-Data/BDFR/SubReddit/**** --subreddit **** -S new --no-dupes --filename-restriction-scheme windows --config /ZFS-Data/BDFR/config1.cfg

Or is this RedGIFS warning me?

MachiavelliSeraphim avatar Feb 14 '24 17:02 MachiavelliSeraphim

No, you need to authenticate with the --authenticate option. Otherwise the tokens won't be loaded and Reddit will rate limit you.

Serene-Arc avatar Feb 15 '24 00:02 Serene-Arc

Tokens? I thought you just had to put the info in the config file? Guess im behind the times :(

So i just add --authenticate to the line or is there more to do?

MachiavelliSeraphim avatar Feb 15 '24 00:02 MachiavelliSeraphim

I tried to add --authenticate and it pops up with authericate at this url

The URL causes a bad request Invalid redirect_uri parameter Error

So I guess ive got no idea WTF is goin on here....

MachiavelliSeraphim avatar Feb 15 '24 02:02 MachiavelliSeraphim

maybe add some provisional instructions on the front page for non-technical folk how to avoid this 429 error before they block it all together mistaken it for a scraper (ex. AI companies) , also I suggest just starting a fork, lots of creators don't want to maintain their projects for personal reasons and other people bring their knowledge and improve them.

Devicetron avatar Feb 15 '24 03:02 Devicetron

Id be game for a fork with actual instructions on how to do this. Its REAL hard to sign in to Reddit for a token on a system that's CLI only.... And apparently trying from something other than the CLI machine just fails??

MachiavelliSeraphim avatar Feb 15 '24 05:02 MachiavelliSeraphim

Its REAL hard to sign in to Reddit for a token on a system that's CLI only....

It shouldn't be? If there's no token, it should open up a web browser that authenticates you.

Serene-Arc avatar Feb 15 '24 07:02 Serene-Arc

It does not open a browser on a system without a GUI......

and when I try it on a system with a GUI, I get this. EVERY Time >.< Untitled

So I am 100% confused on what the ACTUAL problem is.

As far as I can tell, this part is what its complaining about "redirect_uri=http%3A%2F%2Flocalhost%3A7634"

MachiavelliSeraphim avatar Feb 15 '24 14:02 MachiavelliSeraphim

So I am 100% confused on what the ACTUAL problem is.

As far as I can tell, this part is what its complaining about "redirect_uri=http%3A%2F%2Flocalhost%3A7634"

Pretty sure I used https://localhost/ for the redirect URI and it worked fine. I can't checked because for some reason i changed it to reddit.com now. Haven't had to auth since the change though so time will tell what this change does.

madhatr avatar Feb 29 '24 13:02 madhatr