srs icon indicating copy to clipboard operation
srs copied to clipboard

RTMP server gets stuck if ever more than 1000 simultaneous connections

Open darintay opened this issue 8 months ago • 3 comments

Describe the bug If there are ever 1000+ RTMP simultaneous connections, no future connections will ever be accepted (until restart).

Version Tested with docker releases 5.0-r2, 5.0-r3 and 6.0-a2.

To Reproduce

  • Run srs with rtmp server.
  • Run: for x in {0..1000}; do timeout 5s nc localhost 1935 & done; wait;
  • Observe in logs 1000 lines of ...RTMP client ip=127.0.0.1:39708, fd=420 followed by 1000 lines of client disconnect peer. ret=1008
  • Try repeating the command above, or any RTMP connection; note that no more RTMP client logs show up at all, streams cannot be sent

Also note, if you use 999 instead of 1000, then it appears to work fine, you can run the loop as many times as you want and the server does not break.

Expected behavior Even if the maximum connections are exceeded, I expect once they are disposed those slots will be usable and future connections will work.

darintay avatar May 04 '25 20:05 darintay

Please revise the following configuration

max_connections     1000;

duiniuluantanqin avatar May 09 '25 02:05 duiniuluantanqin

The issue is not that there is a limit.

The bug is that if you ever hit the limit, the server breaks such that even if old connections finish, no new connections are ever accepted.

darintay avatar May 09 '25 02:05 darintay

Oh, if what you say is true, then it's indeed a bug, we need to confirm it.

duiniuluantanqin avatar May 09 '25 08:05 duiniuluantanqin

Cannot reproduce in SRS 7.0

I tested with SRS 7.0 using the same reproduction steps:

for x in {0..1000}; do timeout 5s nc localhost 1935 & done; wait;

Result: All 1001 connections were accepted and properly cleaned up. After the test completed, new RTMP connections work normally.

[2025-10-25 20:31:42.955][INFO][3825][mn05078o] RTMP client transport=plaintext, ip=127.0.0.1:53815, fd=1010
[2025-10-25 20:31:43.824][INFO][3825][4m3bvz12] RTC: before dispose resource(RtmpConn)(0x6120000af3c0), conns=997, zombies=0, ign=0, inz=0, ind=0
[2025-10-25 20:31:43.824][WARN][3825][4m3bvz12][54] client disconnect peer. ret=1008



[2025-10-25 20:31:47.941][WARN][3825][mn05078o][54] client disconnect peer. ret=1008
[2025-10-25 20:31:47.941][INFO][3825][kktaho5c] RTC: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2025-10-25 20:31:47.941][INFO][3825][mn05078o] RTC: disposing #0 resource(RtmpConn)(0x612000014d40), conns=1, disposing=1, zombies=0
[2025-10-25 20:31:49.885][INFO][3825][e7o40414] SRS: cpu=0.00%,0MB, cid=906,441, timer=61,0,0, clock=0,38,7,0,0,0,1,1,0, free=66
[2025-10-25 20:31:54.886][INFO][3825][e7o40414] SRS: cpu=0.00%,0MB, cid=906,441, timer=61,0,0, clock=0,38,7,0,0,0,1,1,0, free=66
[2025-10-25 20:31:59.887][INFO][3825][e7o40414] SRS: cpu=0.00%,0MB, cid=1,0, timer=62,0,0, clock=0,37,10,0,0,0,0,0,0



[2025-10-25 20:32:02.973][INFO][3825][51rf9597] RTMP client transport=plaintext, ip=127.0.0.1:53817, fd=14
[2025-10-25 20:32:02.977][INFO][3825][51rf9597] complex handshake success
[2025-10-25 20:32:02.982][INFO][3825][51rf9597] connect app, tcUrl=rtmp://localhost:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=localhost, port=1935, app=live, args=null
[2025-10-25 20:32:02.983][INFO][3825][51rf9597] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2025-10-25 20:32:02.984][INFO][3825][51rf9597] client identified, type=fmle-publish, vhost=localhost, app=live, stream=livestream, param=, duration=0ms

Log evidence:

  • Connections accepted: RTMP client transport=plaintext, ip=127.0.0.1:53815, fd=1010
  • Connections cleaned up: client disconnect peer. ret=1008
  • Resource manager working: RTC: clear zombies=1 resources, conns=1
  • New connections work: Successfully accepted new RTMP publish connection after the test

Possible reasons for the original issue:

  1. Version-specific bug: This may have been fixed between 5.0/6.0-a2 and 7.0
  2. Configuration difference: The max_connections setting or system ulimit may have been different
  3. Timing/load issue: The issue might only occur under specific timing conditions or higher load

winlinvip avatar Oct 26 '25 00:10 winlinvip

The issue hasn't been reproducible for a long time.

winlinvip avatar Oct 31 '25 22:10 winlinvip