bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

[BUG] Imgur - Response code 429

Open gemini0x2 opened this issue 1 year ago • 23 comments

  • [x] I am reporting a bug.
  • [x] I am running the latest version of BDfR
  • [x] I have read the Opening an issue

Description

Imgur links are returning response code 429. Note: I'm able to browse Imgur normally in my browser and even access the direct links of files that return 429 in bdfr. This error continues even after waiting 24 hours. Never had this issue before.

Command

python bdfr --user reddituser --submitted

Environment

  • OS: [MacOS]
  • Python version: [3.10.6]

Logs

[2023-05-23 22:53:15,330 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2023-05-23 22:53:15,330 - bdfr.connector - Level 9] - Created download filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created time filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created sort filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Create file name formatter
[2023-05-23 22:53:15,331 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Created site authenticator
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved user data
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-05-23 22:53:27,185 - bdfr.downloader - DEBUG] - Attempting to download submission 13nbj2e
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Attempting to download submission 13jz8vx
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Using Imgur with url https://i.imgur.com/XpO4ZNm.gifv
[2023-05-23 22:56:28,293 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 60 seconds: Response code 429
[2023-05-23 22:57:28,485 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 120 seconds: Response code 429
[2023-05-23 22:59:28,586 - bdfr.resource - ERROR] - Max wait time exceeded for resource at url https://i.imgur.com/XpO4ZNm.mp4
[2023-05-23 22:59:28,586 - bdfr.downloader - ERROR] - Failed to download resource https://i.imgur.com/XpO4ZNm.mp4 in submission 13jz8vx with downloader Imgur: Could not download resource: Response code 429

gemini0x2 avatar May 24 '23 03:05 gemini0x2

Having the same issue

electricpollution avatar May 24 '23 03:05 electricpollution

Imgur is nuking all NSFW content from reddit, not sure if this can be fixed but that's most likely the cause. It must be scrweing with the API.

GarethFreeman avatar May 24 '23 22:05 GarethFreeman

@GarethFreeman I know about that, but if thats the cause for response code 429 then why we can still access any content in the browser without any problem? that makes no sense, unless somehow they detect something peculiar on how bdfr is making the download requests.

gemini0x2 avatar May 24 '23 23:05 gemini0x2

HTTP code 429 is a rate limiting error code. It means that Imgur has received too many requests from the browser/application. There's not really any way for us to deal with this or get around it. It just means that you have to be slower or get less Imgur posts.

Serene-Arc avatar May 25 '23 01:05 Serene-Arc

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.

    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...

I expect it's the accept more than the user-agent, but haven't tried without both.

eawooten avatar May 27 '23 03:05 eawooten

Adding curl to the headers fixed the issue. Too bad I didn't figured this out sooner. Thanks, @eawooten for the solution!

gemini0x2 avatar May 27 '23 18:05 gemini0x2

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.

    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...

I expect it's the accept more than the user-agent, but haven't tried without both.

Thank you! Adding curl fixes the issues with imgur, but breaks redgifs.

GGaroufalis avatar May 27 '23 20:05 GGaroufalis

@GGaroufalis Right, I didn't noticed that! A conditional statement will help.

gemini0x2 avatar May 27 '23 20:05 gemini0x2

@Gavriik I think this one fixes it

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.redgifs.com"):
         headers = download_parameters.get("headers")
     else:
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

you need to add

import urllib.parse
from urllib.parse import urlparse

and

import fnmatch

at the top

GGaroufalis avatar May 28 '23 00:05 GGaroufalis

Not sure why you wouldn't put it here rather than make the download function super janky...

Soulsuck24 avatar May 28 '23 01:05 Soulsuck24

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

gemini0x2 avatar May 28 '23 02:05 gemini0x2

Not sure why you wouldn't put it here rather than make the download function super janky...

@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.

gemini0x2 avatar May 28 '23 02:05 gemini0x2

@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.

You're right, I was thinking this one instead, my bad.

Weird though if your connection to the API and through a browser is working but the script is getting 429s on the direct link download. The changes here are just changing the downloader from using the default requests user-agent to the curl one. This would be the first I've seen them limiting on something other than IP, but it's then not solely the user-agent as the requests one is used to access the API and it's not getting 429s. Odd.

Soulsuck24 avatar May 28 '23 13:05 Soulsuck24

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

How do you implement this? Apologies for not being an experienced coder.

GarethFreeman avatar May 28 '23 18:05 GarethFreeman

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

How do you implement this? Apologies for not being an experienced coder.

rename the attached to "resource.py" and drop it in the bdfr folder resource.txt

GGaroufalis avatar May 28 '23 20:05 GGaroufalis

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

How do you implement this? Apologies for not being an experienced coder.

rename the attached to "resource.py" and drop it in the bdfr folder resource.txt

Do you still use the bdfr function or is curl different? Could you provide an example of a reddit user download?

GarethFreeman avatar May 28 '23 22:05 GarethFreeman

@GarethFreeman same, just make sure the modified resource.py is in the right location.

gemini0x2 avatar May 28 '23 22:05 gemini0x2

@Gavriik C:\Users\AppData\Local\BDFR\bdfr right? It's still giving me the 429 response code.

GarethFreeman avatar May 28 '23 22:05 GarethFreeman

@GarethFreeman mine is in C:\Users\Administrator\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr

yours might differ a bit depending on your python version

GGaroufalis avatar May 28 '23 22:05 GGaroufalis

@Gavriik I literally don't have that folder at all. I'm on 310, I just don't understand the problem. There are no other folders where I can place that file.

GarethFreeman avatar May 28 '23 23:05 GarethFreeman

The following command should give you the correct location: python3 -m pip show bdfr

gemini0x2 avatar May 28 '23 23:05 gemini0x2

@Gavriik Finally got it working, thanks for all the help mate.

GarethFreeman avatar May 28 '23 23:05 GarethFreeman

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.

    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...

I expect it's the accept more than the user-agent, but haven't tried without both.

just adding the user-agent worked for me. (using wget)

queirozfcom avatar Mar 09 '24 06:03 queirozfcom