Is this tool capable of downloading every image that was every posted to a subreddit?
Hello,
I am trying to build some very large datasets of particular images. I would like to scrape all of /r/FoodPorn to start, however this tool seems to stop after about ~1000-2000 images.
According to this tool there are 378k submissions: https://api.pushshift.io/reddit/search/submission/?subreddit=FoodPorn&metadata=true&size=0&after=0
Is it possible to get them all?
Thanks.
If you use a PushShift source, it should get them all. If it doesn't, it's a bug I can look into. Also, there are currently large-scale changes underway to RMD in order to greatly improve it, so hopefully this will be resolved soon.
I just tried that and it says it scraped 3000+ this time but I only see 1000 in the folder. Maybe the older posts are inaccessible?
Hmm, It seems like the PushShift library RMD uses is having some compatibility issues with some new changes to PushShift's API. I'll look into a custom solution and/or a fix for the library.