kodiak icon indicating copy to clipboard operation
kodiak copied to clipboard

ref(bot): robust redis connection pool

Open sbdchd opened this issue 1 year ago • 3 comments

Sometimes the connection to the redis host can be severed for a little bit resulting in a ConnectionLostError error.

Ideally the redis client would handle these more gracefully, but as a work around we can send a "canary" request to redis to see if the connection is dead and have it create a new one if that's the case.

This will increase our RPS to redis, but they're only ping requests so it should be pretty fast.

There are some other redis connections that aren't wrapped, but when those connections fail the entire task restarts, so it isn't a big deal. This RobustPool is intended for the nitty gritty APIs that mutate state and can't safely be retried -- maybe they can but more involved.

sbdchd avatar Jul 19 '22 02:07 sbdchd

The test is hanging, something to do with redis:

+ ./.venv/bin/pytest kodiak/test_queue.py
=================================== test session starts ===================================
platform darwin -- Python 3.7.13, pytest-6.0.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/steve/projects/kodiak/bot, configfile: tox.ini
plugins: cov-2.12.1, anyio-3.6.1, asyncio-0.12.0, kodiak-0.1.0, mock-3.3.1
collected 6 items                                                                         

kodiak/test_queue.py .....

happens even if I remove the mock

sbdchd avatar Jul 19 '22 02:07 sbdchd

We could also switch to redis-py which has proper health checks built into the client

sbdchd avatar Jul 23 '22 17:07 sbdchd

ConnectionResetError -> https://docs.python.org/3/library/errno.html#errno.ECONNRESET

And ECONNRESET is caused by the server, redis in this case.

So we should investigate on the redis host:

  • ulimit for file descriptors ulimit -n
  • total connected connections to the server -- info clients redis command
  • somaxconns kernel related thing for tcp -- https://github.com/redis/redis/issues/6123#issuecomment-512808174

While probably not related, we might as well fix the warning about huge pages:

https://github.com/redis/redis/issues/3176

sbdchd avatar Jul 23 '22 17:07 sbdchd