bulk-downloader-for-reddit [BUG] Imgur

[x] I am reporting a bug.
[x] I am running the latest version of BDfR
[x] I have read the Opening an issue

Description

Imgur links are returning response code 429. Note: I'm able to browse Imgur normally in my browser and even access the direct links of files that return 429 in bdfr. This error continues even after waiting 24 hours. Never had this issue before.

Command

python bdfr --user reddituser --submitted

Environment

OS: [MacOS]
Python version: [3.10.6]

Logs

[2023-05-23 22:53:15,330 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2023-05-23 22:53:15,330 - bdfr.connector - Level 9] - Created download filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created time filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created sort filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Create file name formatter
[2023-05-23 22:53:15,331 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Created site authenticator
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved user data
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-05-23 22:53:27,185 - bdfr.downloader - DEBUG] - Attempting to download submission 13nbj2e
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Attempting to download submission 13jz8vx
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Using Imgur with url https://i.imgur.com/XpO4ZNm.gifv
[2023-05-23 22:56:28,293 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 60 seconds: Response code 429
[2023-05-23 22:57:28,485 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 120 seconds: Response code 429
[2023-05-23 22:59:28,586 - bdfr.resource - ERROR] - Max wait time exceeded for resource at url https://i.imgur.com/XpO4ZNm.mp4
[2023-05-23 22:59:28,586 - bdfr.downloader - ERROR] - Failed to download resource https://i.imgur.com/XpO4ZNm.mp4 in submission 13jz8vx with downloader Imgur: Could not download resource: Response code 429

May 24 '23 03:05 gemini0x2

Having the same issue

May 24 '23 03:05 electricpollution

Imgur is nuking all NSFW content from reddit, not sure if this can be fixed but that's most likely the cause. It must be scrweing with the API.

May 24 '23 22:05 GarethFreeman

@GarethFreeman I know about that, but if thats the cause for response code 429 then why we can still access any content in the browser without any problem? that makes no sense, unless somehow they detect something peculiar on how bdfr is making the download requests.

May 24 '23 23:05 gemini0x2

HTTP code 429 is a rate limiting error code. It means that Imgur has received too many requests from the browser/application. There's not really any way for us to deal with this or get around it. It just means that you have to be slower or get less Imgur posts.

May 25 '23 01:05 Serene-Arc

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.

    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...

I expect it's the accept more than the user-agent, but haven't tried without both.

May 27 '23 03:05 eawooten

Adding curl to the headers fixed the issue. Too bad I didn't figured this out sooner. Thanks, @eawooten for the solution!

May 27 '23 18:05 gemini0x2

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.
    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...
I expect it's the accept more than the user-agent, but haven't tried without both.

Thank you! Adding curl fixes the issues with imgur, but breaks redgifs.

May 27 '23 20:05 GGaroufalis

@GGaroufalis Right, I didn't noticed that! A conditional statement will help.

May 27 '23 20:05 gemini0x2

@Gavriik I think this one fixes it

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.redgifs.com"):
         headers = download_parameters.get("headers")
     else:
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

you need to add

import urllib.parse
from urllib.parse import urlparse

and

import fnmatch

at the top

May 28 '23 00:05 GGaroufalis

Not sure why you wouldn't put it here rather than make the download function super janky...

May 28 '23 01:05 Soulsuck24

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

May 28 '23 02:05 gemini0x2

Not sure why you wouldn't put it here rather than make the download function super janky...

@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.

May 28 '23 02:05 gemini0x2

@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.

You're right, I was thinking this one instead, my bad.

Weird though if your connection to the API and through a browser is working but the script is getting 429s on the direct link download. The changes here are just changing the downloader from using the default requests user-agent to the curl one. This would be the first I've seen them limiting on something other than IP, but it's then not solely the user-agent as the requests one is used to access the API and it's not getting 429s. Odd.

May 28 '23 13:05 Soulsuck24

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

How do you implement this? Apologies for not being an experienced coder.

May 28 '23 18:05 GarethFreeman

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }
How do you implement this? Apologies for not being an experienced coder.

rename the attached to "resource.py" and drop it in the bdfr folder resource.txt

May 28 '23 20:05 GGaroufalis

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }
How do you implement this? Apologies for not being an experienced coder.
rename the attached to "resource.py" and drop it in the bdfr folder resource.txt

Do you still use the bdfr function or is curl different? Could you provide an example of a reddit user download?

May 28 '23 22:05 GarethFreeman

@GarethFreeman same, just make sure the modified resource.py is in the right location.

May 28 '23 22:05 gemini0x2

@Gavriik C:\Users\AppData\Local\BDFR\bdfr right? It's still giving me the 429 response code.

May 28 '23 22:05 GarethFreeman

@GarethFreeman mine is in C:\Users\Administrator\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr

yours might differ a bit depending on your python version

May 28 '23 22:05 GGaroufalis

@Gavriik I literally don't have that folder at all. I'm on 310, I just don't understand the problem. There are no other folders where I can place that file.

May 28 '23 23:05 GarethFreeman

The following command should give you the correct location: python3 -m pip show bdfr

May 28 '23 23:05 gemini0x2

@Gavriik Finally got it working, thanks for all the help mate.

May 28 '23 23:05 GarethFreeman

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.
    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...
I expect it's the accept more than the user-agent, but haven't tried without both.

just adding the user-agent worked for me. (using wget)

Mar 09 '24 06:03 queirozfcom

bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard

[BUG] Imgur - Response code 429

Description

Command

Environment

Logs

bulk-downloader-for-reddit bulk-downloader-for-reddit copied to clipboard

[BUG] Imgur - Response code 429

Description

Command

Environment

Logs

bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard