redditDataExtractor
redditDataExtractor copied to clipboard
Possible to archive everything?
I'm failing to find the option to archive absolutely everything, is it possible?
What do you mean by "absolutely everything"? Literally everything on reddit? Or every just every submission on a subreddit?
There are two limits here:
- The reddit API restricts downloading anything more than 1000 posts in the past.
- The reddit API has restrictions on the amount of calls made in a certain time frame, so you can't possibly archive every post on reddit.
Every submission (within the past 1000) on a subreddit should be possible. And if you want the comment data, in addition to the submission data and external content, you just need to check the check boxes "Download External Content Linked by Submission", "Download External Content Linked in Selftext", "Download External Content Linked in Comments", and "Download JSON-encoded submission content" in the settings page.
There is currently no option for continuous download. You have to kick off the downloading each time. So if you set it off every day, as long as no more than 1000 posts were submitted that day, you'll get everything. I may add the option for continuous downloads at some point if people want it.
There is also a limit that not all external content can be downloaded. Currently ".gifv" files are not supported (they didn't exist until recently). I will support them eventually. And currently only a few sites are supported for everything (like imgur). Other sites, if it is not a directly linked image, the image won't be downloaded since the program can't guarantee what image on the page you want.
Can current program extract the post time of comments? It seems like only the main post's time was extracted but not the follow-up comments.