musicbrainz-docker icon indicating copy to clipboard operation
musicbrainz-docker copied to clipboard

Redis connection errors

Open nikosmichas opened this issue 3 years ago • 10 comments

Hello! I have increased the number of workers for the webserver and run the services through docker-compose, while sending a lot of API requests. After some time, I start getting errors like these in the logs of the server:

musicbrainz_1  | 2021-11-16T13:52:22.714329479Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."
musicbrainz_1  | 2021-11-16T13:52:26.429439298Z [error] Caught exception in engine "Could not connect to Redis server at redis:6379: Cannot assign requested address at lib/MusicBrainz/Redis.pm line 24.
musicbrainz_1  | 2021-11-16T13:52:26.429469986Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."
musicbrainz_1  | 2021-11-16T13:52:36.652664437Z [error] Caught exception in MusicBrainz::Server::Controller::WS::2::Work->load "Could not connect to Redis server at redis:6379: Cannot assign requested address at /root/perl5/lib/perl5/Redis.pm line 275.


musicbrainz_1  | 2021-11-16T13:47:21.205178216Z [error] Caught exception in MusicBrainz::Server::Controller::WS::2::Recording->load "Could not connect to Redis server at redis:6379: Cannot assign requested address at /root/perl5/lib/perl5/Redis.pm line 275.
musicbrainz_1  | 2021-11-16T13:47:21.205218285Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."
musicbrainz_1  | 2021-11-16T13:47:22.142083445Z [error] Caught exception in MusicBrainz::Server::Controller::WS::2::Recording->load "Could not connect to Redis server at redis:6379: Cannot assign requested address at /root/perl5/lib/perl5/Redis.pm line 275.
musicbrainz_1  | 2021-11-16T13:47:22.142122078Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."

Any idea/suggestion on how to handle this?

nikosmichas avatar Nov 16 '21 13:11 nikosmichas

Hi!

It might be the Redis instance is over-solicited. You may need to get your hands dirty at tuning the configuration of your Redis instance. It can probably be achieved by passing options on the command-line through a local Docker Compose override file.

Here are the options we pass to our Redis instance for cache at musicbrainz.org:

--maxmemory 1GB --maxmemory-policy allkeys-lru --save ""

To pass these options to the command-line, please read the quick how-to I wrote about Docker Compose Overrides and adapt Modify memory settings to your specific needs which look like services > redis > command > redis --maxmemory….

@mwiencek: Since you are more knowledgeable than me about Redis use in MusicBrainz, can you please double-check both the reported issue (for a potential bug to be fixed in musicbrainz-server) and my answer (for potential misconceptions)?

yvanzo avatar Nov 18 '21 16:11 yvanzo

Thanks @yvanzo for your reply. I will try this

nikosmichas avatar Nov 19 '21 08:11 nikosmichas

It seems that the issue was not specific for Redis. Even after disabling redis entirely, I kept getting a similar error for Postgres. The problem is that because of the large number of requests that I was sending, the OS of the MusicBrainz container could not create a new socket between itself and the other services.

I noticed with netstat that a huge amount of TIME_WAIT connections was there not allowing new connections to be created. I resolved this by changing tcp_max_tw_buckets in the MusicBrainz docker image and now the services are able to run with approximately 100 web workers in parallel without "Cound not connect" errors.

Ideally, this could be resolved at the application level, by reusing the connections it is creating (eg use connection pooling for Postgres)

More information about the issue https://www.percona.com/blog/2014/12/08/what-happens-when-your-application-cannot-open-yet-another-connection-to-mysql/

I could open a Pull Request with the change in the docker-compose.yml if you think that this may be useful in other cases.

nikosmichas avatar Nov 30 '21 10:11 nikosmichas

Ideally, this could be resolved at the application level, by reusing the connections it is creating (eg use connection pooling for Postgres)

In production, we use pgbouncer. Would it be worth including it in musicbrainz-docker too?

yvanzo avatar Dec 01 '21 10:12 yvanzo

I resolved this by changing tcp_max_tw_buckets in the MusicBrainz docker image and now the services are able to run with approximately 100 web workers in parallel without "Cound not connect" errors. I could open a Pull Request with the change in the docker-compose.yml if you think that this may be useful in other cases.

Thanks, if that would be complementary to Postgres connection pooling, yes.

yvanzo avatar Dec 01 '21 10:12 yvanzo

Ideally, this could be resolved at the application level, by reusing the connections it is creating (eg use connection pooling for Postgres)

In production, we use pgbouncer. Would it be worth including it in musicbrainz-docker too?

Maybe it would, even as an optional part.

I resolved this by changing tcp_max_tw_buckets in the MusicBrainz docker image and now the services are able to run with approximately 100 web workers in parallel without "Cound not connect" errors. I could open a Pull Request with the change in the docker-compose.yml if you think that this may be useful in other cases.

Thanks, if that would be complementary to Postgres connection pooling, yes.

Yes, it can be complementary to the pooling and it will also help avoid issues with Redis. I will open a PR shortly.

nikosmichas avatar Dec 01 '21 11:12 nikosmichas

By the way @yvanzo do you know if you use a non-default value for --max-keepalive-reqs and --keepalive-timeout in Starlet?

Setting them also helped me reduce the amount of open sockets a bit..

nikosmichas avatar Dec 01 '21 12:12 nikosmichas

@nikosmichas have you figured out how to modify the local/compose/memory-settings.yml file to accomplish this? I believe this is working for me

version: '3.1'
# Description: Customize memory settings
services:
   redis:
      command: redis-server --maxmemory 1GB --maxmemory-policy allkeys-lru --save ""

JoshDi avatar Aug 09 '22 13:08 JoshDi

using the values above cause my slave server to sometimes timeout for MQ queue or checking the index count vs the DB. I have removed these redis-server modifications and I have not had an issue anymore.

JoshDi avatar Aug 17 '22 18:08 JoshDi

Correction, the error I am getting is below and started to occur after upgrading to ubuntu x64 22.04.1 LTS.

OCI runtime exec failed: exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

From searching on the web, it looks to be an SELinux issue

JoshDi avatar Aug 18 '22 13:08 JoshDi