reddit-html-archiver icon indicating copy to clipboard operation
reddit-html-archiver copied to clipboard

Could this eventually support private subreddits & praw / authentication?

Open github-userx opened this issue 5 years ago • 11 comments

github-userx avatar May 30 '19 16:05 github-userx

Yeah, praw support could be added somewhat easily. I think there's a 1000 posts per sub limit though https://www.reddit.com/r/redditdev/comments/8zhcmr/how_to_crawl_more_than_1000_posts_through_reddit/

Pushshift may have data for subs that went private but weren't always private.

libertysoft3 avatar May 31 '19 03:05 libertysoft3

Huh? Since when does pushshift archive private subreddits? Now I am so confused. Pushshift doesn’t have any access to private subreddits AFAIK!

  1. Mai 2019, 05:03 from [email protected]:

I'm pretty sure that private subs are supported now, the data comes from PushShift.

Yeah, praw could be added somewhat easily, depending on what you're trying to do with it. It would be cool to have praw fetch updates scores.

— You are receiving this because you authored the thread. Reply to this email directly, > view it on GitHub https://github.com/libertysoft3/reddit-html-archiver/issues/6?email_source=notifications&email_token=AIY4ABWE4FAMNKP776VIVHTPYCIPNA5CNFSM4HRGQO6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUCFMY#issuecomment-497558195> , or > mute the thread https://github.com/notifications/unsubscribe-auth/AIY4ABU6KSHRX5EKWZ6NCZLPYCIPNANCNFSM4HRGQO6A> .

github-userx avatar May 31 '19 07:05 github-userx

Yeah I was full of it and tried to fix it with an edit. They might have some private sub data in one scenario only. Anyway what about the 1000 post limit with the reddit API?

libertysoft3 avatar May 31 '19 07:05 libertysoft3

I would already be happy to get 1000 posts from a private subreddit

  1. Mai 2019, 09:52 from [email protected]:

Yeah I was full of it and tried to fix it with an edit. They might have some private sub data in one scenario only. Anyway what about the 1000 post limit with the reddit API?

— You are receiving this because you authored the thread. Reply to this email directly, > view it on GitHub https://github.com/libertysoft3/reddit-html-archiver/issues/6?email_source=notifications&email_token=AIY4ABVHGEYTN6QXAQHZLPDPYDKKXA5CNFSM4HRGQO6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUPUAI#issuecomment-497613313> , or > mute the thread https://github.com/notifications/unsubscribe-auth/AIY4ABTSRBG44GEXPZLCOGDPYDKKXANCNFSM4HRGQO6A> .

github-userx avatar May 31 '19 08:05 github-userx

I'm into it, pull requests accepted. Everyone that wants this please give this issue a thumbs up.

libertysoft3 avatar Jun 13 '19 06:06 libertysoft3

Any update?

github-userx avatar Oct 25 '19 12:10 github-userx

I'm pretty busy with SaidIt lately.. I don't have this scheduled to be done currently.

libertysoft3 avatar Oct 26 '19 02:10 libertysoft3

Ok, thanks! No worries! ;) 

github-userx avatar Oct 26 '19 07:10 github-userx

Hey I tried looking into adding my own implementation, but it looks like this isn't possible anymore because of Reddit's deprecation of cloudsearch. So I can't search posts by any specific date under this newer search system, and looking at r/changelog there hasn't been any updates to this newer system since then. So unless this private sub has been public before and picked up by pushift, I don't think theres anything I could do. Let me know If I missed anything or I'm totally wrong

https://www.reddit.com/r/changelog/comments/7tus5f/update_to_search_api/

YoungerDryas89 avatar Nov 04 '19 00:11 YoungerDryas89

Thanks a lot! Really sucks how they changed the API. ;(

github-userx avatar Nov 04 '19 08:11 github-userx

Yep, PRAW talks about it in their changelog for version 6.0.0:

Removed: Subreddit.submissions as the API endpoint backing the method is no more. See https://www.reddit.com/r/changelog/comments/7tus5f/update_to_search_api/.

https://praw.readthedocs.io/en/latest/package_info/change_log.html

So I guess the best that can be done for private subs is constantly consuming new posts, and getting 1000 posts out of each different API sort that's offered.

libertysoft3 avatar Nov 05 '19 10:11 libertysoft3