bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard
Fix: Ids are longer than expected
When using a list of (valid) ids in a command like
bdfr clone --subreddit ... --include-id-file Z:/ID_list.txt Z:/Reddit
some ids fail with praw.exceptions.InvalidURL: Invalid URL: zabcdefg
.
Example exception:
[2023-02-17 12:34:56,789 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
File "Z:\bulk-downloader-for-reddit\bdfr\__main__.py", line 160, in cli_clone
reddit_scraper = RedditCloner(config, [stream])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\bulk-downloader-for-reddit\bdfr\cloner.py", line 19, in __init__
super(RedditCloner, self).__init__(args, logging_handlers)
File "Z:\bulk-downloader-for-reddit\bdfr\downloader.py", line 41, in __init__
super(RedditDownloader, self).__init__(args, logging_handlers)
File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 30, in __init__
super(Archiver, self).__init__(args, logging_handlers)
File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 65, in __init__
self.reddit_lists = self.retrieve_reddit_lists()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 174, in retrieve_reddit_lists
master_list.extend(self.get_submissions_from_link())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 65, in get_submissions_from_link
supplied_submissions.append(self.reddit_instance.submission(url=sub_id))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
return func(**dict(zip(_old_args, args)), **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\reddit.py", line 981, in submission
return models.Submission(self, id=id, url=url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 586, in __init__
self.id = self.id_from_url(url)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 458, in id_from_url
parts = RedditBase._url_parts(url)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\base.py", line 19, in _url_parts
raise InvalidURL(url)
praw.exceptions.InvalidURL: Invalid URL: zabcdefg
In the meantime ids got longer (6-8 characters) (archiver.py#L60, connector.py#L313).
Fixes the two relevant locations.
Hi, thanks for the PR.
Question: if praw
crashes, how do you know that the IDs are valid? That would suggest that they are, in fact, not valid because Reddit's own library is rejecting them.
It crashed saying "invalid url: ...", but I used a list of ids. When I looked at the line given in the error I saw that the length of the id was different than what was checked for. As soon as I changed the line the downloads worked. When the length check fails, BDFR passes the id to praw as an url.
Right, seems fine then. If you can write up tests for this bug, I'll review and merge.
I've to correct myself, the length of IDs is 6 or 7 characters, there are no 8 characters yet. I didn't manually check the few longer ones in the big list which my script outputted, they are indeed invalid. So only one location needs to be adapted.
I cannot run the test suite locally, but I copied a similar looking test and adapted it. The build system will show if it works.
There are also 5 character IDs. #620 is related.