SnapshillBot icon indicating copy to clipboard operation
SnapshillBot copied to clipboard

Refactoring to prepare for gevent

Open hidde-jan opened this issue 7 years ago • 5 comments

This is a WIP branch where I'm refactoring a bit. We currently perform all http requests sequential, which takes a lot of time per submission and limits our ability to add more subreddits to monitor. By using gevent (or some other solution) we can perform the requests (mostly) in parallel, or at least make them non-blocking. This hopefully speeds up the bot a huge deal.

hidde-jan avatar Jun 09 '17 06:06 hidde-jan

@justcool393 ok, so I think I got everything worked out. I'm performing all archiving actions in parallel (it's wicked fast 🐎💨), but apply rate limiting to the creation of archives of links pointing to reddit (I know this sounds weird).

I also noticed we got banned from 4 subs, some of which are quite active, so I decided to automatically unsubscribe from subs we have been banned from.

If you're ok with this, I'm going to test this new setup on the server.

hidde-jan avatar Jun 18 '17 16:06 hidde-jan

@hidde-jan I don't know how I missed this message. This is excellent.

Small thing, I believe Archive.is automagically ratelimits its own requests to reddit (ceddit is based of some quirk on how reddit works, so those are based off of the users who are browsing the archive), so it may only be necessary for sites that do not ratelimit themselves. I do think this is an great idea. The reason I had it at five initially was because reddit was weird about requests when they came from archive.org.

Further, I send a message to the admins about maybe getting one of the services (that isn't currently in place) un-spamfiltered for more redundancy. Also, I'll ask again about archive.org when I get a response.

justcool393 avatar Jun 20 '17 05:06 justcool393

I'm still not completely satisfied. It currently only handles one submission at a time. I want to set up some queue based system that can handle multiple things at once. The xkcd transcriber bot has something like this. I've already started looking at it.

hidde-jan avatar Jun 20 '17 07:06 hidde-jan

I'm gonna test this in our production setting this week :)

hidde-jan avatar Mar 19 '18 19:03 hidde-jan

Ok, I think I know why I abandoned this last year. Praw is not thread safe. It's rate limit function depends on time.sleep and even which patching that out means that there is no actual way to get this working. I'm still pretty happy with the refactoring in this PR, so I'm going to port them to the master brach and close this PR.

hidde-jan avatar Mar 19 '18 20:03 hidde-jan