RedditDownloader icon indicating copy to clipboard operation
RedditDownloader copied to clipboard

Is this tool capable of downloading every image that was every posted to a subreddit?

Open efirdc opened this issue 5 years ago • 3 comments

Hello,

I am trying to build some very large datasets of particular images. I would like to scrape all of /r/FoodPorn to start, however this tool seems to stop after about ~1000-2000 images.

According to this tool there are 378k submissions: https://api.pushshift.io/reddit/search/submission/?subreddit=FoodPorn&metadata=true&size=0&after=0

Is it possible to get them all?

Thanks.

efirdc avatar Dec 02 '20 13:12 efirdc

If you use a PushShift source, it should get them all. If it doesn't, it's a bug I can look into. Also, there are currently large-scale changes underway to RMD in order to greatly improve it, so hopefully this will be resolved soon.

shadowmoose avatar Dec 02 '20 17:12 shadowmoose

I just tried that and it says it scraped 3000+ this time but I only see 1000 in the folder. Maybe the older posts are inaccessible?

efirdc avatar Dec 02 '20 18:12 efirdc

Hmm, It seems like the PushShift library RMD uses is having some compatibility issues with some new changes to PushShift's API. I'll look into a custom solution and/or a fix for the library.

shadowmoose avatar Dec 02 '20 20:12 shadowmoose