bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard
[BUG] Redgifs 429 Rate Limiting on Temporary Token Auth API
- [x] I am reporting a bug.
- [x] I am running the latest version of BDfR
- [x] I have read the Opening an issue
Description
Redgifs appears to have clamped down their rate limits for their auth API.
I used to be able to download dozens of profiles back to back with no issues. Sometime in the past few weeks/months, the rate limit for this API appears to have changed. And now I hit 429s before finishing a single profile.
Looking at the code: it appears bdfr is fetching a new token before every request.
https://github.com/aliparlakci/bulk-downloader-for-reddit/blob/8c293a46843c818bea2c2013db38191867993a14/bdfr/site_downloaders/redgifs.py
I think this could benefit from one of the following solutions:
- Simple throttling mechanism. Wait a certain amount of time before requesting a new one.
- Cache token and re-use it until it expires or you encounter a 401/403 error.
- Allow for redgifs authentication to support higher rate limits.
The docs for redgifs say these temp tokens have a short expiration. But even after requesting a token myself, it was hard to tell exactly how short. The decoded token showed an exp
time that seemed to match the current time. Maybe that means they are only 1 time use?
https://github.com/Redgifs/api/wiki/Temporary-tokens
Which of these solutions seem viable or would most likely be accepted as a PR? I possibly could help with a fix for this. But I can't commit to how soon.
Command
cat unsaved.txt | xargs -L 1 | xargs -I {} bdfr download ./users/{} --user {} --log ./run.log --opts bdfr.yaml
sort: top
config: "config.cfg"
authenticate: true
no_dupes: true
search_existing: true
verbose: true
file_scheme: "{UPVOTES}_{REDDITOR}_{TITLE}_{POSTID}"
filename_restriction_scheme: "windows"
submitted: true
log: "latest_run.log"
disable_module:
- SelfPost
skip_domain:
- api.imgur.com
- imgur.com
- i.imgur.com
Environment (please complete the following information)
- OS: MacOS 14.3
- Python version: 3.9.6
Logs
[2024-01-29 03:58:01,564 - bdfr.downloader - DEBUG] - Attempting to download submission 17qbc5d
[2024-01-29 03:58:01,564 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/silverhoarsecygnet
[2024-01-29 03:58:04,267 - bdfr.file_name_formatter - DEBUG] - Forcing Windows-compatible filenames
[2024-01-29 03:58:04,267 - bdfr.file_name_formatter - DEBUG] - Forcing Windows-compatible filenames
[2024-01-29 03:58:06,198 - bdfr.downloader - DEBUG] - Written file to /Volumes/vault/bdfr/users/angrytoban/amateurgirlsbigcocks/185_angrytoban_Asian tinder date said she never rode a big cock before_17qbc5d.mp4
[2024-01-29 03:58:06,201 - bdfr.downloader - DEBUG] - Hash added to master list: 419fa74540840633943f606016e890aa
[2024-01-29 03:58:06,201 - bdfr.downloader - INFO] - Downloaded submission 17qbc5d from amateurgirlsbigcocks
[2024-01-29 03:58:06,204 - bdfr.downloader - DEBUG] - Attempting to download submission 19be5yy
[2024-01-29 03:58:06,204 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/lavenderjumboafricangoldencat
[2024-01-29 03:58:06,256 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 19be5yy: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary
[2024-01-29 03:58:06,256 - bdfr.downloader - DEBUG] - Attempting to download submission 18ejhit
[2024-01-29 03:58:06,256 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/gianttepidnarwhal
[2024-01-29 03:58:06,307 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 18ejhit: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary
[2024-01-29 03:58:06,307 - bdfr.downloader - DEBUG] - Attempting to download submission 17rbfkn
[2024-01-29 03:58:06,307 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/unluckyhoneydewsnowmonkey
[2024-01-29 03:58:06,359 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 17rbfkn: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary
[2024-01-29 03:58:06,359 - bdfr.downloader - DEBUG] - Attempting to download submission 17fbf49
[2024-01-29 03:58:06,359 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/lightskybluesquarechafer
[2024-01-29 03:58:06,411 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 17fbf49: Server responded with 429 to https://api.redgifs.com/v2/auth/temporary
Thanks for calling this out. I am seeing the same issue.
same issue
Any solution anyone ?
@jameswebb07
I have the temporary token caching working, I'm just working on the error handling now.
- if the temp file does not exist
- if the temp file exists and is not correct
- if the temp file exists and is correct.
I'm throwing a 429 error handler in there too for good measure.
Getting a bit hung up on some basic dorky python things because I've been up all night. Hopefully shouldn't take too much longer (famous last words).
Fun fact, according to the API documentation on temporary tokens, we should be using them for 24 hours. The current script attempts to get the token every time it tries to get a link. Can't blame them for adjusting the API rate limiting because of that. lol.
Thanks @remghoost! The recent PRs have been merged into the development
branch because the repo owner is still unreachable (@aliparlakci ). @Serene-Arc appears to be the one maintaining the development branch.
It looks like Redgifs updated that API documentation this week to clarify the token expiry was ~24 hours. It wasn't there when I checked before opening this issue. That makes me think this is a very recent change.
I messed up when attempting to test your changes and tested against master instead and got myself rate limited again. I will confirm this fixes the issue on my machine soon when my rate limit expires.
@altdc Glad to help!
I did notice that the last changes to the master
branch were from about a year ago.
I could git clone
the development
branch and apply the changes to that instead if it would make merging easier.
And with regards to the original owner disappearing, would it be worth forking off entirely and making a bdfr2
....?
I'm not entirely sure how updating a pip
package works.
I've got my pip install -e .
version though and that works fine for me.
It would be good to move the general userbase off of requesting tokens at such an alarming rate though... Larger companies don't need more reasons to lock down their API (looking at you, Reddit/Twitter).
Heck, now I'm curious if that's happening on all of the site_downloaders
....
...updated that API documentation this week...
Ahh. That explains it. I'm curious why the sudden changes on their part. I mean, it could be the users of this script, but I doubt we'd be enough to really put a dent in their API usage... Though, some larger reddit accounts might have upwards of 300+ posts on redgifs. Maybe it was us. Who knows....
...and got myself rate limited again.
If you have your cached temporary token, sending a request using that token resets your rate limiting (even when testing requests for a new token via the API). I unfortunately found this out later than sooner. Would've saved me a heck of a lot of time early on. haha.
Thanks @remghoost! The recent PRs have been merged into the
development
branch because the repo owner is still unreachable (@aliparlakci ). @Serene-Arc appears to be the one maintaining the development branch.It looks like Redgifs updated that API documentation this week to clarify the token expiry was ~24 hours. It wasn't there when I checked before opening this issue. That makes me think this is a very recent change.
I messed up when attempting to test your changes and tested against master instead and got myself rate limited again. I will confirm this fixes the issue on my machine soon when my rate limit expires.
Anyone know how long the Redgifs rate limit lasts for? I am in the same boat. 403 for all my requests from all of my computers. Not sure if this is redgifs blocking or rate limiting me:
[2024-02-07 09:54:34,807 - bdfr.downloader - DEBUG] - Attempting to download submission 18og8v0 [2024-02-07 09:54:34,822 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/everyfrankstarfish
-=-=-=-=-=-=-=-=-=-=-=- Redgifs API token file not found, retrieving new token Attempting to retrieve new temporary Redgifs API token [2024-02-07 09:54:34,919 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission 18og8v0: Failed to retrieve Redgifs API token: Server responded with 403 to https://api.redgifs.com/v2/auth/temporary
@twentyonerooms87 I did not see this behavior. Maybe it has changed in the past few days? It's weird that they would give you a 403 response for an endpoint that does not support authentication. Maybe they have blocked your IP with something more severe than a temporary rate limit. I will re-check from my end soon.
@twentyonerooms87 I did not see this behavior. Maybe it has changed in the past few days? It's weird that they would give you a 403 response for an endpoint that does not support authentication. Maybe they have blocked your IP with something more severe than a temporary rate limit. I will re-check from my end soon.
I think my IP is blocked. See below. Will submit a request and will hopefully be unblocked.
https://github.com/Redgifs/api/wiki/Blocked
Not sure if it is related but i download gonewild, worked fine until today. I did format with a new mobo and cpu ram and installed bdfr with pipx instead of pip.
I am no longer able to finish a run.
How do i reduce bdfrs speed?
@twentyonerooms87
Hmm. Well, according to the temporary tokens wiki:
There is a strict limit on how many guest tokens you can request per hour. Violating that limit will result in your API access blocked for a long time.
You might fall into the "blocked for a long time" section (because of how the prior code requested auth tokens on every link). I believe an appeal is only way to deal with that....?
@altdc
We might want to look into client tokens or even just full user tokens.
Heck, I even found a redgifs pip package, and it has its own API calling functions. Might be better to switch over to that entirely....?
Not sure. Will do some more research.
It also might be worth reaching out to redgifs and finding out what their preferred method of this whole thing is (to try and prevent people from being banned from API access).
It's a tricky situation, because someone using the base bdfr
package from pip will get rate limited (or even banned) from the Redgifs API. 429
and 401
errors can be fixed (by using the change in my pull request) but we can't fix 403
errors.
Should we publish our own pip package and readjust the README.MD
to point to that....? Since the original repo owner went dark a while back. Though, @Serene-Arc seems to be a maintainer on the pip package, so perhaps we could update the package and prevent more people from being banned....?
My pull request merges fine with the master
branch and shouldn't break any of the other functionality. Would prevent future 403
bans, but wouldn't quite help anyone running an outdated package.
Hmm. Not entirely sure on the process though. I've never published/updated a pip package.
Hi all, thanks for the work on this. I'll review the PR for this issue in a couple hours, there are some things that need to be changed before it's merged.
The information on Ali's absence and the trouble it's caused is spread over a couple of issues so I'll recap it here. I am listed as the maintainer here and on pypi but I do not have administrative privileges on either the pypi package or this repository. That is the cause of the problem. With the advent of Reddit rate limiting, our tests have started to fail if done with the default client token and secret, because they are rate-limited. These tokens are repository secrets, which I cannot access or change.
The master branch for this repository is protected, which means that even I cannot merge anything to it with failing tests. I can't change the tokens, which means the tests will always fail. Thus, I can't merge anything to master.
The pypi package is automatically updated through a workflow when we make a new release, which is done from master. I don't have administrative access to that either, so I can't change anything regarding it. Again, that is done through an API secret I don't have access to.
I've reached out to Ali three times since last August when these issues started, most recently January 16, but obviously he hasn't responded to fix the issue. We also briefly discussed transferring the respository to me, since he doesn't actively develop it anymore. I wanted to transfer it to an organisation to stop these exact types of problems in the future, but it never happened.
That's the state of things. Going forward I'm not entirely sure what to do. If this issue is as severe as you say, then the nuclear option is to cheat the systems. I remove all tests from the codebase bar one that trivially succeeds, merge and release that, then reintroduce the tests for the development branch. That's the best option that succeeds with the authority I have.
Trying to migrate to another repo, such as mine, would be a huge discontinuity. I doubt most people would switch.
@Serene-Arc
Thanks for the update and information!
I am listed as the maintainer here and on pypi but I do not have administrative privileges on either the pypi package or this repository.
The pypi package is automatically updated through a workflow...
Well, those two things definitely complicate matters a bit...
...then the nuclear option is to cheat the systems.
Haha. What a roundabout way to push changes. Remove all of the tests except one that would pass indefinitely. It's definitely an option, but I don't really see any other way to do it other than that....
More and more users will probably be coming over here to check why they're getting 401
403
and 429
errors, so it's probably best to implement these changes sooner than later.
Not to mention that Redgifs might start getting peeved at all of the failing API requests coming their way. It could cause them to lock down their API even further. Or force more users to jump through the hoop of submitting a request to get their IP unbanned.
But this is definitely something that should be pondered a bit more before making a decision though.
Gaining ownership over the repo would be the best case scenario, but it seems like that's not quite possible. Not calling any flak to the repo owner (heck, I get bored with projects too).
So, if I understand you correctly:
- You would remove all of the tests on the
master
branch - Push the changes from the
development
branch - Then re-implement the tests from the
development
branch (which would pass future pull requests)...?
It would still limit us to only pushing changes that could be merged automatically though, correct?
But it would at least let us get the Redgifs issue fixed in the meantime....
Also, I can adjust my pull request's changes to work on the development
branch if you'd like. I was working off of the master
branch (before I knew about all of the ownership issues).
@remghoost It's up to you, there are some things that your PR must change before I would merge it. It's up to you whether I make those changes or make comments for you to change it.
The nuclear option I presented would forbid further changes to master once the tests are reimplemented, but the project development takes place exclusively on the development
branch, so nothing would really change. The master branch is exclusively for releases, nothing else.
I submitted a request via this Redgifs page to unblock my IP. I kept things very generic - used my web browser and some github applications to DL redgifs files:
https://github.com/Redgifs/api/wiki/Blocked
Received this response. I am holding off on submitting the form. Wanted to know folks thoughts:
Thanks for your email. Upon reviewing the notes above, we have confirmed that this is an expected behavior; to continue the access/usage of our content, you may request a formal API access request through this link: https://docs.google.com/forms/d/e/1FAIpQLSf-jnx_BA_y_I3cASpSDrh8dxS18a0r56Bp1T5fTNj7JI4b2g/viewform?hl=en.
You can also find all the necessary information here: https://github.com/Redgifs/api/wiki/API-access
"I've reached out to Ali three times since last August when these issues started, most recently January 16, but obviously he hasn't responded to fix the issue."
you have your answer right there, bdfr needs someone that loves the project and wants to continue it.
He decided that he didn't want to continue the project some time ago, which is why I am the maintainer now. Unfortunately we never transferred full control as he, understandably, still wanted some input and to have the repository under his name. We discussed transferring it to an organisational account but it hasn't happened. I wanted to do that in case of this exact scenario but I can't do it on my own authority.
Yeah, that just happens with life sometimes. I've dropped projects out of the blue too. No harm/foul on their part.
If you can edit the README.MD
, it might be worth:
- forking the project
- making a new python library for it on
pypi
- and mentioning in the
README.MD
that the project is sort of stalled because of those reasons.
Link to the fork and the new package for people running into the 429
errors.
Name it bdfr2
or something like that.
Development could still take place here on the development
branch, but changes worth pushing could be pulled over to the fork (and updating the new pip
package at the same time).
Not the most elegant solution, but it should work. It would allow all of the development to stay here as well, which would be good for longevity.
I'll make new efforts to reach out to @aliparlakci too. My hope is that we can get it transferred to an organisation, lest we lose a ton of users, which we inevitably will if we have to switch that much.
reinstalled recently and now for some reason pip install -U git+https://github.com/aliparlakci/bulk-downloader-for-reddit.git@development wont pull the development release anymore
git says the repo doesnt exist, but it obviously does.....
remote: Repository not found. fatal: repository 'https://github.com/aliparlakci/bulk-downloader-for-reddit.git@development/' not found
Nevermind, I found the problem.... had an extra space in a bad place
and now its giving me
[2024-02-14 12:57:16,597 - bdfr.connector - WARNING] - Using an unauthenticated app like this will result in Reddit limiting queries to 10 requests a minute
when im using valid credentials >.<
Curiosity, Is there a context error on my command?
python3.9 -m bdfr download /ZFS-Data/BDFR/SubReddit/**** --subreddit **** -S new --no-dupes --filename-restriction-scheme windows --config /ZFS-Data/BDFR/config1.cfg
Or is this RedGIFS warning me?
No, you need to authenticate with the --authenticate
option. Otherwise the tokens won't be loaded and Reddit will rate limit you.
Tokens? I thought you just had to put the info in the config file? Guess im behind the times :(
So i just add --authenticate to the line or is there more to do?
I tried to add --authenticate and it pops up with authericate at this url
The URL causes a bad request Invalid redirect_uri parameter Error
So I guess ive got no idea WTF is goin on here....
maybe add some provisional instructions on the front page for non-technical folk how to avoid this 429 error before they block it all together mistaken it for a scraper (ex. AI companies) , also I suggest just starting a fork, lots of creators don't want to maintain their projects for personal reasons and other people bring their knowledge and improve them.
Id be game for a fork with actual instructions on how to do this. Its REAL hard to sign in to Reddit for a token on a system that's CLI only.... And apparently trying from something other than the CLI machine just fails??
Its REAL hard to sign in to Reddit for a token on a system that's CLI only....
It shouldn't be? If there's no token, it should open up a web browser that authenticates you.
It does not open a browser on a system without a GUI......
and when I try it on a system with a GUI, I get this.
EVERY Time >.<
So I am 100% confused on what the ACTUAL problem is.
As far as I can tell, this part is what its complaining about "redirect_uri=http%3A%2F%2Flocalhost%3A7634"
So I am 100% confused on what the ACTUAL problem is.
As far as I can tell, this part is what its complaining about "redirect_uri=http%3A%2F%2Flocalhost%3A7634"
Pretty sure I used https://localhost/ for the redirect URI and it worked fine. I can't checked because for some reason i changed it to reddit.com now. Haven't had to auth since the change though so time will tell what this change does.