redis-py icon indicating copy to clipboard operation
redis-py copied to clipboard

pubsub.run_in_thread : 'sleep_time' and documentation

Open tw-bert opened this issue 8 years ago • 12 comments

The pubsub.run_in_thread implementation works great for us, very nice work indeed.

We do however have a question about the parameter name 'sleep_time' plus its default value, and about the documentation.

Last week, we had a production issue, which was entirely our fault - not tested thoroughly enough. An example from the documentation had been taken too literally, and our CPU cost estimate was wrong.

Note: where I say 'documentation', it may fall under your control or maybe not. We also collected info from StackOverflow and such, we are not entirely sure where the examples came from ;)

We already solved all issues, but recommend the following:

Change sleep_time to socket_timeout. The name sleep_time was very confusing to us, since it's no sleep at all. It's used as a TCP timeout value in the select() statement, where control is returned to the OS.

It's initial value is 0. This is very CPU unfriendly. The documentation says 'be nice to the CPU' by setting it to 0.001. This is a tiny bit more nice than 0, but turned out to be not nice enough for our scenario. On a live server, the run_in_thread was running in hundreds of processes on one server, each causing 0.5-1.0% CPU, clogging everything. Revert and reboot needed (and alas, an hour of production downtime due to dependencies).

The initial value for sleep_time (socket_timeout) should be None in our opinion. Since this week, we run the pubsub as a daemon thread with sleep_time=None, and after initialization, this takes 0.0% CPU. It hands over control to the OS. Our hundreds of processes on that live server run fine now.

Caveat: You should not try to use this thread for pubsub connections to more than 1 redis instance. It's better to fire up an extra thread to accomplish that. Also see the python docs on select: https://docs.python.org/2/howto/sockets.html . Excerpt: give it a nice long timeout (say a minute) unless you have good reason to do otherwise

Kind regards, TW

tw-bert avatar Jan 25 '17 13:01 tw-bert

Changing the sleep_time to a value greater than 0 seems like a good idea. If I recall correctly, if we set sleep_time=None, we'd never get notified if/when the socket dies. I'll need to test this scenario.

andymccurdy avatar Jan 25 '17 18:01 andymccurdy

Good point, that might well be the case. I'll test it as well. Note: a larger value like 60.0 worked fairly good in our scenario as well. 1.0 was already too short, the os didn't park processes as idle anymore.

tw-bert avatar Jan 25 '17 20:01 tw-bert

I agree, the name sleep_time for that parameter is very confusing.

AlexEshoo avatar Sep 04 '18 13:09 AlexEshoo

this is now broken because you are unable to stop the thread

graingert avatar Jul 19 '19 14:07 graingert

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Jul 19 '20 00:07 github-actions[bot]

any updates?

tw-bert avatar Jul 19 '20 08:07 tw-bert

Changing the name to socket_timeout makes sense. We'll do that in 4.0 as it's a backwards incompatible change. I'm also fine with increasing the default value, but I'm not quite sure what the optimal value would be. I believe that defaulting to None seems like a bad idea. If I remember correctly, some types of socket disconnections won't be raised when no timeout is specified and the thread will hang forever. Perhaps 5 or 10 seconds is a reasonable compromise?

andymccurdy avatar Jul 20 '20 18:07 andymccurdy

Sounds good, thx.

I think 2 seconds would be a proper initial value. With a docstring that refers to the official recommendation:

See: https://docs.python.org/2/howto/sockets.html . give it a nice long timeout (say a minute) unless you have good reason to do otherwise

2 seconds is slightly too CPU intensive for our special requirements, but we'll set it to None anyway.

Sidenote: There are always alternatives. An improved implementation could still have a socket_timeout of None, but with an event loop and a non-blocking socket and possibly a signal mask. Programs that have noticable CPU usage while idle should be prevented if at all possible. But I know this adds a lot of logic/maintenance to your module, and from experience I know it's very hard to maintain py2 compat, going that route. So changing the default socket_timeout to 2 seconds sounds reasonable to me.

tw-bert avatar Jul 23 '20 06:07 tw-bert

Thanks for the feedback. I'll add this to the redis-py 4.0 TODO list. FYI, 4.0 will be Python 3.5+ only.

andymccurdy avatar Jul 23 '20 19:07 andymccurdy

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Jul 24 '21 00:07 github-actions[bot]

Bump

graingert avatar Jul 24 '21 00:07 graingert

@sav-norem. Wanna?

chayim avatar Sep 01 '22 12:09 chayim

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Sep 02 '23 00:09 github-actions[bot]