bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

Fix: Ids are longer than expected

Open thomas694 opened this issue 2 years ago • 5 comments

When using a list of (valid) ids in a command like bdfr clone --subreddit ... --include-id-file Z:/ID_list.txt Z:/Reddit some ids fail with praw.exceptions.InvalidURL: Invalid URL: zabcdefg.

Example exception:

[2023-02-17 12:34:56,789 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "Z:\bulk-downloader-for-reddit\bdfr\__main__.py", line 160, in cli_clone
    reddit_scraper = RedditCloner(config, [stream])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\cloner.py", line 19, in __init__
    super(RedditCloner, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\downloader.py", line 41, in __init__
    super(RedditDownloader, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 30, in __init__
    super(Archiver, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 65, in __init__
    self.reddit_lists = self.retrieve_reddit_lists()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 174, in retrieve_reddit_lists
    master_list.extend(self.get_submissions_from_link())
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 65, in get_submissions_from_link
    supplied_submissions.append(self.reddit_instance.submission(url=sub_id))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\reddit.py", line 981, in submission
    return models.Submission(self, id=id, url=url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 586, in __init__
    self.id = self.id_from_url(url)
              ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 458, in id_from_url
    parts = RedditBase._url_parts(url)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\base.py", line 19, in _url_parts
    raise InvalidURL(url)
praw.exceptions.InvalidURL: Invalid URL: zabcdefg

In the meantime ids got longer (6-8 characters) (archiver.py#L60, connector.py#L313).

Fixes the two relevant locations.

thomas694 avatar Feb 19 '23 00:02 thomas694

Hi, thanks for the PR.

Question: if praw crashes, how do you know that the IDs are valid? That would suggest that they are, in fact, not valid because Reddit's own library is rejecting them.

Serene-Arc avatar Feb 19 '23 00:02 Serene-Arc

It crashed saying "invalid url: ...", but I used a list of ids. When I looked at the line given in the error I saw that the length of the id was different than what was checked for. As soon as I changed the line the downloads worked. When the length check fails, BDFR passes the id to praw as an url.

thomas694 avatar Feb 19 '23 00:02 thomas694

Right, seems fine then. If you can write up tests for this bug, I'll review and merge.

Serene-Arc avatar Feb 19 '23 00:02 Serene-Arc

I've to correct myself, the length of IDs is 6 or 7 characters, there are no 8 characters yet. I didn't manually check the few longer ones in the big list which my script outputted, they are indeed invalid. So only one location needs to be adapted.

I cannot run the test suite locally, but I copied a similar looking test and adapted it. The build system will show if it works.

thomas694 avatar Feb 20 '23 23:02 thomas694

There are also 5 character IDs. #620 is related.

geneccx avatar Jul 15 '23 21:07 geneccx