bulk-downloader-for-reddit Fix: Ids are longer than expected

When using a list of (valid) ids in a command like bdfr clone --subreddit ... --include-id-file Z:/ID_list.txt Z:/Reddit some ids fail with praw.exceptions.InvalidURL: Invalid URL: zabcdefg.

Example exception:

[2023-02-17 12:34:56,789 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "Z:\bulk-downloader-for-reddit\bdfr\__main__.py", line 160, in cli_clone
    reddit_scraper = RedditCloner(config, [stream])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\cloner.py", line 19, in __init__
    super(RedditCloner, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\downloader.py", line 41, in __init__
    super(RedditDownloader, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 30, in __init__
    super(Archiver, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 65, in __init__
    self.reddit_lists = self.retrieve_reddit_lists()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 174, in retrieve_reddit_lists
    master_list.extend(self.get_submissions_from_link())
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 65, in get_submissions_from_link
    supplied_submissions.append(self.reddit_instance.submission(url=sub_id))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\reddit.py", line 981, in submission
    return models.Submission(self, id=id, url=url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 586, in __init__
    self.id = self.id_from_url(url)
              ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 458, in id_from_url
    parts = RedditBase._url_parts(url)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\base.py", line 19, in _url_parts
    raise InvalidURL(url)
praw.exceptions.InvalidURL: Invalid URL: zabcdefg

In the meantime ids got longer (6-8 characters) (archiver.py#L60, connector.py#L313).

Fixes the two relevant locations.

Feb 19 '23 00:02 thomas694

Hi, thanks for the PR.

Question: if praw crashes, how do you know that the IDs are valid? That would suggest that they are, in fact, not valid because Reddit's own library is rejecting them.

Feb 19 '23 00:02 Serene-Arc

It crashed saying "invalid url: ...", but I used a list of ids. When I looked at the line given in the error I saw that the length of the id was different than what was checked for. As soon as I changed the line the downloads worked. When the length check fails, BDFR passes the id to praw as an url.

Feb 19 '23 00:02 thomas694

Right, seems fine then. If you can write up tests for this bug, I'll review and merge.

Feb 19 '23 00:02 Serene-Arc

I've to correct myself, the length of IDs is 6 or 7 characters, there are no 8 characters yet. I didn't manually check the few longer ones in the big list which my script outputted, they are indeed invalid. So only one location needs to be adapted.

I cannot run the test suite locally, but I copied a similar looking test and adapted it. The build system will show if it works.

Feb 20 '23 23:02 thomas694

There are also 5 character IDs. #620 is related.

Jul 15 '23 21:07 geneccx

bulk-downloader-for-reddit bulk-downloader-for-reddit copied to clipboard

Fix: Ids are longer than expected

bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard