KeyDB icon indicating copy to clipboard operation
KeyDB copied to clipboard

[BUG] 100k+ open pipe handles causes KeyDB to stop accepting new connections

Open patrickhampson opened this issue 2 years ago • 6 comments

Describe the bug

KeyDB opens a significant amount of pipes on the system (Ubuntu 20.04 - kernel 5.4.0-113) and is unable to handle any new connections. Existing connections remain stable. Running keydb-cli hangs and keydb-diagnostic-tool reports too many open files.

Open handles:

root@redis1:~# lsof | grep keydb | wc -l
116736
root@redis1:~# keydb-diagnostic-tool
Could not connect to Redis at 127.0.0.1:6379: Can't create socket: Too many open files
Segmentation fault (core dumped)

lsof output snippet:

keydb-ser  16232                             keydb   13r     FIFO               0,13      0t0     266389 pipe
keydb-ser  16232                             keydb   14w     FIFO               0,13      0t0     266389 pipe
keydb-ser  16232                             keydb   15r     FIFO               0,13      0t0     722606 pipe
keydb-ser  16232                             keydb   16r     FIFO               0,13      0t0     266488 pipe
keydb-ser  16232                             keydb   17w     FIFO               0,13      0t0     266488 pipe
keydb-ser  16232                             keydb   18r     FIFO               0,13      0t0     266606 pipe
keydb-ser  16232                             keydb   19w     FIFO               0,13      0t0     266606 pipe

To reproduce

Unknown, KeyDB ran stable for ~1 month and then experienced this bug.

Expected behavior

KeyDB should maintain a reasonable number of pipes.

Additional information

Running KeyDB 6.3.1 standalone on Ubuntu 20.04 from the keydb.dev repo.

Configuration is mostly default, this installation is using active-active replication between 2 members. Authentication is enabled and there are about 15 TCP connections to the DB. Both replicas exhibit the same issue with 100k open pipe handles.

The DB is still in the above state so happy to run any diagnostic commands necessary.

patrickhampson avatar Jun 24 '22 20:06 patrickhampson

can confirm this for 6.3.1, not reproducible with 6.2.2

mono2k avatar Jun 26 '22 14:06 mono2k

@ben HiPri

On Sun, Jun 26, 2022 at 10:38 AM Pas Ratunkowy @.***> wrote:

can confirm this for 6.3.1, not reproducible with 6.2.2

— Reply to this email directly, view it on GitHub https://github.com/Snapchat/KeyDB/issues/453#issuecomment-1166552209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5W4AT57UHADH4FN4N5ABDVRBTPRANCNFSM5ZY7D7CA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JohnSully avatar Jun 26 '22 16:06 JohnSully

@benschermel

wolfet avatar Jul 01 '22 05:07 wolfet

I figured out, that every persistance save (from config SAVE) will open 25 new files (for me) and never close them until OS limit for open files will reach.

EDIT: i tested 6.3.1 and 6.2.1 (both using only unix sockets) 6.2.1 - save will never increase open files (in long term) 6.2.1 - on new connection 4 new files are open; on close 4 files are closed 6.3.1 - save will increase open files by 25 and let them opened 6.3.1 - on new connection 59 new files are open; on close 59 files are closed

padinko avatar Jul 12 '22 07:07 padinko

We faced with exactly the same on prod today. keydb-server 6:6.3.1-1~focal1 Tried to increase number of open-files using prlimit --nofile=350000 --pid=13471 and save data, but it didn't help. Downgraded to 6.2.2

perl-coder avatar Jul 19 '22 14:07 perl-coder

Same here with TCP sockets on v6.3.1. It seems to reproduce when the server is under high workload and is dumping rdb at the same time.

Even when the server is working normally, lsof | grep keydb | wc -l often gives a number over 100k, while on v6.2.2 it never exceeds 10k.

moycat avatar Aug 18 '22 09:08 moycat

Nice! @msotheeswaran do you have a timeline for the next 6.3.x release? I have the same issue and just need to decide between downgrading or waiting to upgrade.

cykirsch avatar Mar 03 '23 15:03 cykirsch

FWIW we did the downgrade, it's working well, and they noted on #474 that the next 6.3.x should be in the next month.

cykirsch avatar Mar 13 '23 19:03 cykirsch