srs icon indicating copy to clipboard operation
srs copied to clipboard

RTC2RTMP: SRS崩溃问题,core文件和日志附在了下面

Open shawnlihst opened this issue 3 years ago • 2 comments

Note: Please read FAQ before file an issue, see 2716

Note: 提问前,请先看FAQ, 即 2716

Description(描述)

Please description your issue here(描述你遇到了什么问题)

srsCrash崩溃了,时间在2022年5月13日 11点13分

  1. SRS Version(版本): SRS/4.0.245

  2. SRS Log(日志):

日志文件和core文件过大,附在下载链接里,crash发生在2022年5月13日11:13
  • core-log.tar.gz下载链接:http://121.40.139.118/download/core-log.tar.gz
  • SRS二进制v4.0-b9srs.zip
  1. SRS Config(配置):
listen              29711;
max_connections     1000;
# For docker, please use docker logs to manage the logs of SRS.
# See https://docs.docker.com/config/containers/logging/
daemon              off;
srs_log_tank        file;
srs_log_file        ./objs/srsmeeting.log;
pid                 objs/meeting.pid;

http_api {
    enabled         on;
    listen          8029;

}
http_server {
    enabled         on;
    listen          29713;
    dir             ./objs/nginx/html;
}
rtc_server {
    enabled on;
    listen 29715; # UDP port
    # @see https://github.com/ossrs/srs/wiki/v4_CN_WebRTC#config-candidate
    candidate 服务器公网ip,不方便展示;
}
stats {
    network         0;
    disk            sda sdb xvda xvdb;
}

vhost __defaultVhost__ {
     enabled         on;
     min_latency     on;
     tcp_nodelay     on;
rtc {
     enabled     on;
     rtmp_to_rtc on;
     rtc_to_rtmp on;
     # The PLI interval in seconds, for RTC to RTMP.
     # Note the available range is [0.5, 30]
     # Default: 6.0
     # pli_for_rtmp 6.0;
    }
play {
        # set the MW(merged-write) latency in ms.
        #         # SRS always set mw on, so we just set the latency value.
        # the latency of stream >= mw_latency + mr_latency
        # the value recomment is [300, 1800]
        queue_length    10;
        gop_cache       off;
        mw_latency      100;
      }
}

Replay(重现)

Please describe how to replay the bug? (重现Bug的步骤)

暂无,如果重现,会补充进来

Expect(期望行为)

Please describe your expectation(描述你期望发生的事情)

shawnlihst avatar May 13 '22 04:05 shawnlihst

Download the coredump and SRS binary, then run gdb:

docker run --rm -it -v `pwd`:/srs -w /srs registry.cn-hangzhou.aliyuncs.com/ossrs/srs:dev gdb srs -c core*

The stack of coredump:

[root@afa08885f337 srs]# gdb srs -c core.21038

Core was generated by `./objs/srs -c conf/ucpmeeting.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000618d54 in SrsRtpPacket::get_avsync_time (this=0x0) at src/kernel/srs_kernel_rtc_rtp.hpp:335

warning: Source file is more recent than executable.
335	    int64_t get_avsync_time() const { return avsync_time_; }
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb) bt
#0  0x0000000000618d54 in SrsRtpPacket::get_avsync_time (this=0x0) at src/kernel/srs_kernel_rtc_rtp.hpp:335
#1  0x000000000061361f in SrsRtmpFromRtcBridger::packet_video_rtmp (this=0x10ddc420, start=0, end=22704)
    at src/app/srs_app_rtc_source.cpp:1614
#2  0x000000000061277f in SrsRtmpFromRtcBridger::packet_video (this=0x10ddc420, src=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:1451
#3  0x0000000000611f34 in SrsRtmpFromRtcBridger::on_rtp (this=0x10ddc420, pkt=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:1347
#4  0x000000000060f48e in SrsRtcSource::on_rtp (this=0x61cf3c0, pkt=0x103c28e0) at src/app/srs_app_rtc_source.cpp:632
#5  0x0000000000617c03 in SrsRtcVideoRecvTrack::on_rtp (this=0xe2db520, source=0x61cf3c0, pkt=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:2520
#6  0x00000000005d22cb in SrsRtcPublishStream::do_on_rtp_plaintext (this=0xf11f660, pkt=@0x7fc966db7758: 0x103c28e0, 
    buf=0x7fc966db7740) at src/app/srs_app_rtc_conn.cpp:1451
#7  0x00000000005d2086 in SrsRtcPublishStream::on_rtp_plaintext (this=0xf11f660, 
    plaintext=0x2572200 "\220\375X\261T\331\341x\336\201\267x\276\336", nb_plaintext=31) at src/app/srs_app_rtc_conn.cpp:1419
#8  0x00000000005d1db8 in SrsRtcPublishStream::on_rtp (this=0xf11f660, 
    data=0x2572200 "\220\375X\261T\331\341x\336\201\267x\276\336", nb_data=41) at src/app/srs_app_rtc_conn.cpp:1386
#9  0x00000000005d6316 in SrsRtcConnection::on_rtp (this=0x3fcc690, 
    data=0x2572200 "\220\375X\261T\331\341x\336\201\267x\276\336", nb_data=41) at src/app/srs_app_rtc_conn.cpp:2272
#10 0x000000000060914a in SrsRtcServer::on_udp_packet (this=0x2438ab0, skt=0x7fc966db7b20)
    at src/app/srs_app_rtc_server.cpp:462
#11 0x00000000005a8e25 in SrsUdpMuxListener::cycle (this=0x254ef80) at src/app/srs_app_listener.cpp:620
#12 0x0000000000525a5d in SrsFastCoroutine::cycle (this=0x255f0c0) at src/app/srs_app_st.cpp:272
#13 0x0000000000525ae0 in SrsFastCoroutine::pfn (arg=0x255f0c0) at src/app/srs_app_st.cpp:287
#14 0x0000000000652d3b in _st_thread_main () at sched.c:363
#15 0x00000000006535ae in st_thread_create (start=0x7fc965cd0760 <main_arena>, arg=0x38, joinable=32713, stk_size=1704519420)
    at sched.c:694

This is the packet retrieved from the cache when converting from RTC to RTMP, and it is NULL.

    SrsCommonMessage rtmp;
    SrsRtpPacket* pkt = cache_video_pkts_[cache_index(start)].pkt;
    rtmp.header.initialize_video(nb_payload, pkt->get_avsync_time(), 1);

This leads to a null pointer reference.

Note: If you are not using RTC to RTMP, you can disable this feature rtc_to_rtmp to bypass it.

The information of the cached packets is 'cache_video_pkts_'.

(gdb) p cache_video_pkts_
$1 = {{in_use = false, sn = 0, ts = 0, rtp_ts = 0, pkt = 0x0} <repeats 166 times>, {in_use = true, sn = 22694, 
    ts = 224828526, rtp_ts = 1423527736, pkt = 0x48205b0}, {in_use = true, sn = 22695, ts = 224828558, rtp_ts = 1423530616, 
    pkt = 0x75a3110}, {in_use = true, sn = 22696, ts = 224828590, rtp_ts = 1423533496, pkt = 0x58f4930}, {in_use = true, 
    sn = 22697, ts = 224828638, rtp_ts = 1423537816, pkt = 0x855c6d0}, {in_use = true, sn = 22698, ts = 224828670, 
    rtp_ts = 1423540696, pkt = 0xc528c10}, {in_use = true, sn = 22699, ts = 224828718, rtp_ts = 1423545016, pkt = 0x3eac500}, 
  {in_use = true, sn = 22700, ts = 224828750, rtp_ts = 1423547896, pkt = 0x2b841b0}, {in_use = true, sn = 22701, 
    ts = 224828782, rtp_ts = 1423550776, pkt = 0x4850700}, {in_use = true, sn = 22702, ts = 224828830, rtp_ts = 1423555096, 
    pkt = 0x6756ed0}, {in_use = true, sn = 22703, ts = 224828862, rtp_ts = 1423557976, pkt = 0xbfe3d90}, {in_use = true, 
    sn = 22704, ts = 224828894, rtp_ts = 1423560856, pkt = 0x30e0a20}, {in_use = true, sn = 22705, ts = 224828942, 
    rtp_ts = 1423565176, pkt = 0x6a31100}, {in_use = false, sn = 0, ts = 0, rtp_ts = 0, pkt = 0x0} <repeats 334 times>}

The information of 'bridger' is:

(gdb) p *this
$3 = {<ISrsRtcSourceBridger> = {_vptr.ISrsRtcSourceBridger = 0x9e5070 <vtable for SrsRtmpFromRtcBridger+16>}, source_ = 
    0xb4072e0, codec_ = 0xb0776e0, is_first_audio = true, is_first_video = true, format = 0x9898070, 
  static s_cache_size = 512, cache_video_pkts_ = {{in_use = false, sn = 0, ts = 0, rtp_ts = 0, 
      pkt = 0x0} <repeats 166 times>, {in_use = true, sn = 22694, ts = 224828526, rtp_ts = 1423527736, pkt = 0x48205b0}, {
      in_use = true, sn = 22695, ts = 224828558, rtp_ts = 1423530616, pkt = 0x75a3110}, {in_use = true, sn = 22696, 
      ts = 224828590, rtp_ts = 1423533496, pkt = 0x58f4930}, {in_use = true, sn = 22697, ts = 224828638, rtp_ts = 1423537816, 
      pkt = 0x855c6d0}, {in_use = true, sn = 22698, ts = 224828670, rtp_ts = 1423540696, pkt = 0xc528c10}, {in_use = true, 
      sn = 22699, ts = 224828718, rtp_ts = 1423545016, pkt = 0x3eac500}, {in_use = true, sn = 22700, ts = 224828750, 
      rtp_ts = 1423547896, pkt = 0x2b841b0}, {in_use = true, sn = 22701, ts = 224828782, rtp_ts = 1423550776, 
      pkt = 0x4850700}, {in_use = true, sn = 22702, ts = 224828830, rtp_ts = 1423555096, pkt = 0x6756ed0}, {in_use = true, 
      sn = 22703, ts = 224828862, rtp_ts = 1423557976, pkt = 0xbfe3d90}, {in_use = true, sn = 22704, ts = 224828894, 
      rtp_ts = 1423560856, pkt = 0x30e0a20}, {in_use = true, sn = 22705, ts = 224828942, rtp_ts = 1423565176, 
      pkt = 0x6a31100}, {in_use = false, sn = 0, ts = 0, rtp_ts = 0, pkt = 0x0} <repeats 334 times>}, header_sn_ = 0, 
  lost_sn_ = 22705, rtp_key_frame_ts_ = -1}

The information of the caller's SN is:

(gdb) f 2
#2  0x000000000061277f in SrsRtmpFromRtcBridger::packet_video (this=0x10ddc420, src=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:1451
1451	                if ((err = packet_video_rtmp(header_sn_, tail_sn)) != srs_success) {
(gdb) p header_sn_
$4 = 0
(gdb) p tail_sn
$5 = 22704

The log at the time of the crash:

[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] RTC: DTLS handshake done.
[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] RTC: session pub=1, sub=0, to=30000ms connection established
[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] RTC: Publisher url=/meeting/100000000036783-436778429 established
[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] ignore disabled exec for vhost=__defaultVhost__
[2022-05-13 11:13:47.563][Trace][21038][8i0q72f4] NACK: update seq=22652, nack range [22651, 22652]
[2022-05-13 11:13:47.621][Trace][21038][4mt73s5j] time diff to large=1099, next out=227400, new pkt=228499, set to new pkt
[2022-05-13 11:13:48.081][Trace][21038][3q11ude1] time diff to large=1200, next out=474480, new pkt=475680, set to new pkt
[2022-05-13 11:13:48.122][Trace][21038][941q820f] RTC: Need PLI ssrc=717477422, play=[941q820f], publish=[8i0q72f4], count=43/43
[2022-05-13 11:13:48.122][Trace][21038][8i0q72f4] RTC: Request PLI ssrc=717477422, play=[941q820f], count=43/43, bytes=12B
[2022-05-13 11:13:48.195][Trace][21038][919d8dz3] 38B video sh,  codec(7, profile=Baseline, level=3.1, 640x360, 0kbps, 0.0fps, 0.0s)
[2022-05-13 11:13:48.195][Trace][21038][919d8dz3] set ts=3591688447, header=25326, lost=25327
[2022-05-13 11:13:48.962][Trace][21038][4mt73s5j] time diff to large=1001, next out=228839, new pkt=229840, set to new pkt
[2022-05-13 11:13:49.378][Trace][21038][941q820f] Hybrid cpu=7.99%,336MB, cid=10,34, timer=61,24,396, clock=0,42,5,0,0,0,0,1,0, free=1, objs=(pkt:814,raw:394,fua:418,msg:1178,oth:2,buf:255)
[2022-05-13 11:13:49.378][Trace][21038][941q820f] RTC: Server conns=9, rpkts=(245,rtp:213,stun:3,rtcp:28), spkts=(431,rtp:396,stun:3,rtcp:52), rtcp=(pli:1,twcc:21,rr:2), snk=(214,a:107,v:107,h:0), fid=(id:1,fid:245,ffid:0,addr:3,faddr:245)

TRANS_BY_GPT3

winlinvip avatar May 21 '22 11:05 winlinvip

  1. It can be seen from SrsRtmpFromRtcBridger::packet_video_rtmp (this=0x10ddc420, start=0, end=22704) that there is a fault, under normal circumstances it is not possible to have so many packet losses (distance(0, 22704));
  2. Continuing to look at header_sn_=0, rtp_key_frame_ts_=-1, it is speculated that no key_frame has been received, which can be proven from the logs, filtering condition ContextId=q38p5iuu);
  3. It can be inferred that lost_sn_ has not been initialized (random value 22705), coincidentally pkt->header.get_sequence is also 22705, and the program can reach line 1451 in srs_app_rtc_source.cpp. In the end, it crashes. Conclusion: The tragedy caused by lost_sn_ not being initialized. The starting sequence of webrtc is a random value, as long as the first frame is not a key_frame, the sequence within a GOP can increase to lost_sn_ and cause a crash.

TRANS_BY_GPT3

anjisuan783 avatar Jun 23 '22 01:06 anjisuan783