subreddit-comments-dl icon indicating copy to clipboard operation
subreddit-comments-dl copied to clipboard

Pragmatic to download all comments from subreddit after date?

Open st-vincent1 opened this issue 2 years ago • 1 comments

Hi,

I have decided to use your application to download data for my project, this is all comments from a single small (80k subscribers) subreddit from the past 5 years.

I found the framework very easy to use, but I couldn't find a reliable way to ensure that all comments are downloaded. I'm currently running a process with 1024 batch size and 1000 laps, after 2 days 27 laps have been processed but it's impossible to know how many more I need.

Would you be able to advise on this?

st-vincent1 avatar Jun 08 '22 09:06 st-vincent1

Hi @st-vincent1, thanks for share your problem, this is for sure the most annoying problem of the program.

Another user has a similar problem an year ago and basically the problem should be with those lines that call the official API to download the comments text: https://github.com/pistocop/subreddit-comments-dl/blob/a9f02a0a041be3b1f425b4cce1e57e658a737754/src/subreddit_downloader.py#L171 https://github.com/pistocop/subreddit-comments-dl/blob/a9f02a0a041be3b1f425b4cce1e57e658a737754/src/subreddit_downloader.py#L172

For what I have saw there should be some API restriction on the number of comments a user can fetch.


My advices for you are to follow the steps as described on the README under the The program stuck and don't run: bullet.

Basically, if waiting doesn't work, you can use the --comments-cap flag to limit the number of comments downloaded from a subreddit, this solution unfortunately imply that you will not download all the comments from a submission, but the program stop after a predefined iterations times.

I know this fix isn't a complete solution but instead a workaround, but when I developed this package this was the best I could did. Maybe something better in the last year was published, but I'm not confident about this because the official documentation still use the same functions I'm using here.

pistocop avatar Jun 10 '22 08:06 pistocop