bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard
[SITE] How to download "posts" which are links to comments?
First of all, I'm not sure how to submit this as a question, because I don't think it's a bug, and "[SITE]" seemed like the best option...?
- [x] I am requesting a site support.
- [x] I am running the latest version of BDfR
- [x] I have read the Opening an issue
This might be a weird question - but -
I'm trying to download an entire subreddit which consists only of links to comments in other subreddits. I'm hoping to get the single comment the link goes to (ideally actually the entire thread that follows, but beggars can't be choosers).
However, instead of getting a md file with the comment, I'm getting the content of the OP of the linked thread. Does that make sense?
Alternatively: The subreddit reposts comments by one specific user, so an alternative is to just download everything that user has ever said. This is sub-optimal for several reasons: 1, not every comment is useful/interesting, the subreddit is just the good ones, and 2, after about 30 posts, I get the following error:
praw.exceptions.ClientException: This comment does not appear to be in the comment tree
Here's the command I'm using:
bdfr archive --user PoppinKREAM --all-comments --file-scheme '{REDDITOR}_{SUBREDDIT}_{TITLE}_{POSTID}' ./output
and the full error:
Traceback (most recent call last):
File "/usr/local/bin/bdfr", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/bdfr/__main__.py", line 139, in cli_archive
reddit_archiver.download()
File "/usr/local/lib/python3.10/site-packages/bdfr/archiver.py", line 49, in download
self.write_entry(submission)
File "/usr/local/lib/python3.10/site-packages/bdfr/archiver.py", line 92, in write_entry
self._write_entry_json(archive_entry)
File "/usr/local/lib/python3.10/site-packages/bdfr/archiver.py", line 103, in _write_entry_json
content = json.dumps(entry.compile())
File "/usr/local/lib/python3.10/site-packages/bdfr/archive_entry/comment_archive_entry.py", line 18, in compile
self.source.refresh()
File "/usr/local/lib/python3.10/site-packages/praw/models/reddit/comment.py", line 309, in refresh
raise ClientException(self.MISSING_COMMENT_MESSAGE)
praw.exceptions.ClientException: This comment does not appear to be in the comment tree
Finally, I think this is the post it's failing on:
https://old.reddit.com/r/reddevils/comments/146eg1s/brandon_williams_rant_roudup/jnqnprn/
I'm using the latest version via pip, updated last week.
To reiterate: I would much prefer a solution to the initial problem, if there is one: how to download posts that are links to comments.
What is the subreddit you tried to originally download, and the command you used? Would like to try this myself. If you are talking about r/ShitPoppinKreamSays, it just fails downloading anything since the links there are np.reddit.com links and bdfr has no proper downloading module for that. You could try scraping the log bdfr generates for these links though, collecting them in a file and downloading that using bdfr archive --include-id-file comments.txt --comment-context
. See #835 for some inspiration for how I tried that method. Some kind of hacking is probably required. Also note that #851 could cause some issues here. I check Github very infrequently, so a reply might take some time, but I hope my tips help somewhat!