subreddit-comments-dl
subreddit-comments-dl copied to clipboard
Pragmatic to download all comments from subreddit after date?
Hi,
I have decided to use your application to download data for my project, this is all comments from a single small (80k subscribers) subreddit from the past 5 years.
I found the framework very easy to use, but I couldn't find a reliable way to ensure that all comments are downloaded. I'm currently running a process with 1024 batch size and 1000 laps, after 2 days 27 laps have been processed but it's impossible to know how many more I need.
Would you be able to advise on this?
Hi @st-vincent1, thanks for share your problem, this is for sure the most annoying problem of the program.
Another user has a similar problem an year ago and basically the problem should be with those lines that call the official API to download the comments text: https://github.com/pistocop/subreddit-comments-dl/blob/a9f02a0a041be3b1f425b4cce1e57e658a737754/src/subreddit_downloader.py#L171 https://github.com/pistocop/subreddit-comments-dl/blob/a9f02a0a041be3b1f425b4cce1e57e658a737754/src/subreddit_downloader.py#L172
For what I have saw there should be some API restriction on the number of comments a user can fetch.
My advices for you are to follow the steps as described on the README under the The program stuck and don't run:
bullet.
Basically, if waiting doesn't work, you can use the --comments-cap
flag to limit the number of comments downloaded from a subreddit, this solution unfortunately imply that you will not download all the comments from a submission, but the program stop after a predefined iterations times.
I know this fix isn't a complete solution but instead a workaround, but when I developed this package this was the best I could did. Maybe something better in the last year was published, but I'm not confident about this because the official documentation still use the same functions I'm using here.