bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard
[BUG] Imgur - Response code 429
- [x] I am reporting a bug.
- [x] I am running the latest version of BDfR
- [x] I have read the Opening an issue
Description
Imgur links are returning response code 429. Note: I'm able to browse Imgur normally in my browser and even access the direct links of files that return 429 in bdfr. This error continues even after waiting 24 hours. Never had this issue before.
Command
python bdfr --user reddituser --submitted
Environment
- OS: [MacOS]
- Python version: [3.10.6]
Logs
[2023-05-23 22:53:15,330 - bdfr.connector - DEBUG] - Disabling the following modules:
[2023-05-23 22:53:15,330 - bdfr.connector - Level 9] - Created download filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created time filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created sort filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Create file name formatter
[2023-05-23 22:53:15,331 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Created site authenticator
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved user data
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-05-23 22:53:27,185 - bdfr.downloader - DEBUG] - Attempting to download submission 13nbj2e
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Attempting to download submission 13jz8vx
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Using Imgur with url https://i.imgur.com/XpO4ZNm.gifv
[2023-05-23 22:56:28,293 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 60 seconds: Response code 429
[2023-05-23 22:57:28,485 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 120 seconds: Response code 429
[2023-05-23 22:59:28,586 - bdfr.resource - ERROR] - Max wait time exceeded for resource at url https://i.imgur.com/XpO4ZNm.mp4
[2023-05-23 22:59:28,586 - bdfr.downloader - ERROR] - Failed to download resource https://i.imgur.com/XpO4ZNm.mp4 in submission 13jz8vx with downloader Imgur: Could not download resource: Response code 429
Having the same issue
Imgur is nuking all NSFW content from reddit, not sure if this can be fixed but that's most likely the cause. It must be scrweing with the API.
@GarethFreeman I know about that, but if thats the cause for response code 429 then why we can still access any content in the browser without any problem? that makes no sense, unless somehow they detect something peculiar on how bdfr is making the download requests.
HTTP code 429 is a rate limiting error code. It means that Imgur has received too many requests from the browser/application. There's not really any way for us to deal with this or get around it. It just means that you have to be slower or get less Imgur posts.
I was having the same issue but noticed that curl
was able to download the same urls with no problem.
Adding the curl
s default headers to the resource method below fixed the issue for me.
@staticmethod
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
# headers = download_parameters.get("headers")
headers = {
"user-agent": "curl/7.84.0",
"accept": "*/*"
}
...
I expect it's the accept
more than the user-agent
, but haven't tried without both.
Adding curl to the headers fixed the issue. Too bad I didn't figured this out sooner. Thanks, @eawooten for the solution!
I was having the same issue but noticed that
curl
was able to download the same urls with no problem.Adding the
curl
s default headers to the resource method below fixed the issue for me.@staticmethod def http_download(url: str, download_parameters: dict) -> Optional[bytes]: # headers = download_parameters.get("headers") headers = { "user-agent": "curl/7.84.0", "accept": "*/*" } ...
I expect it's the
accept
more than theuser-agent
, but haven't tried without both.
Thank you! Adding curl fixes the issues with imgur, but breaks redgifs.
@GGaroufalis Right, I didn't noticed that! A conditional statement will help.
@Gavriik I think this one fixes it
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
domain = urlparse(url).hostname
if fnmatch.fnmatch(domain, "*.redgifs.com"):
headers = download_parameters.get("headers")
else:
headers = {
"user-agent": "curl/8.1.1",
"accept": "*/*"
}
you need to add
import urllib.parse
from urllib.parse import urlparse
and
import fnmatch
at the top
Not sure why you wouldn't put it here rather than make the download function super janky...
@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
headers = download_parameters.get("headers")
domain = urlparse(url).hostname
if fnmatch.fnmatch(domain, "*.imgur.com"):
headers = {
"user-agent": "curl/8.1.1",
"accept": "*/*"
}
Not sure why you wouldn't put it here rather than make the download function super janky...
@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.
@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.
You're right, I was thinking this one instead, my bad.
Weird though if your connection to the API and through a browser is working but the script is getting 429s on the direct link download. The changes here are just changing the downloader from using the default requests user-agent to the curl one. This would be the first I've seen them limiting on something other than IP, but it's then not solely the user-agent as the requests one is used to access the API and it's not getting 429s. Odd.
@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]: headers = download_parameters.get("headers") domain = urlparse(url).hostname if fnmatch.fnmatch(domain, "*.imgur.com"): headers = { "user-agent": "curl/8.1.1", "accept": "*/*" }
How do you implement this? Apologies for not being an experienced coder.
@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]: headers = download_parameters.get("headers") domain = urlparse(url).hostname if fnmatch.fnmatch(domain, "*.imgur.com"): headers = { "user-agent": "curl/8.1.1", "accept": "*/*" }
How do you implement this? Apologies for not being an experienced coder.
rename the attached to "resource.py" and drop it in the bdfr folder resource.txt
@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]: headers = download_parameters.get("headers") domain = urlparse(url).hostname if fnmatch.fnmatch(domain, "*.imgur.com"): headers = { "user-agent": "curl/8.1.1", "accept": "*/*" }
How do you implement this? Apologies for not being an experienced coder.
rename the attached to "resource.py" and drop it in the bdfr folder resource.txt
Do you still use the bdfr function or is curl different? Could you provide an example of a reddit user download?
@GarethFreeman same, just make sure the modified resource.py
is in the right location.
@Gavriik C:\Users\AppData\Local\BDFR\bdfr right? It's still giving me the 429 response code.
@GarethFreeman mine is in C:\Users\Administrator\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr
yours might differ a bit depending on your python version
@Gavriik I literally don't have that folder at all. I'm on 310, I just don't understand the problem. There are no other folders where I can place that file.
The following command should give you the correct location:
python3 -m pip show bdfr
@Gavriik Finally got it working, thanks for all the help mate.
I was having the same issue but noticed that
curl
was able to download the same urls with no problem.Adding the
curl
s default headers to the resource method below fixed the issue for me.@staticmethod def http_download(url: str, download_parameters: dict) -> Optional[bytes]: # headers = download_parameters.get("headers") headers = { "user-agent": "curl/7.84.0", "accept": "*/*" } ...
I expect it's the
accept
more than theuser-agent
, but haven't tried without both.
just adding the user-agent worked for me. (using wget
)