Ant-Media-Server icon indicating copy to clipboard operation
Ant-Media-Server copied to clipboard

Add the ability to detect a "stuck" rtsp stream and then restart as needed.

Open alfred-stokespace opened this issue 1 year ago • 11 comments

Is your feature request related to a problem? Please describe. We have a camera of "Axis" brand which occasionally exabits some strange behavior in AMS. From the HLS perspective, the access logs show the pattern of: m3u8 requests (of the camera in question) with 304 status code with an occasional 200 and zero ".ts" file requests were seen. New browsers pointed at this stream show nothing, existing browser sessions show repeating video (ie. last couple seconds before failure are shown over and over again)

At the IP packet capture level, the camera in question is sending RTSP packets of sizes (769, 385, 384) followed by ack packets from AMS to the camera (of zero length). Followed, occasionally by AMS issuing the RTSP GET_PARAMETER followed by the RTSP/1.0 200 OK from the camera. This pattern of RTSP packets of sizes (769, 385, 384) keeps going and going....

In summary, the tcp connection between AMS and Axis camera is active. The camera is sending data and is responding to AMS. Despite this, the stream is "stuck" at the HLS level, and probably at the rtsp level as well.

I issued a Stop Broadcasting from AMS console. The capture shows rtsp: TEARDOWN on that streams camera port. I follow up with "Start Broadcasting", and then the expected rtsp: OPTIONS call from AMS on a new port to the same camera. The camera responds and we get the RTSP startup/negotiation sequence followed by the camera sending packets to AMS lengths like (2676, 1338, 4014). At this point the preview in AMS is working and browsers are now showing the stream as well. Now the access log show 200s for m3u8 and progressing ".ts" files like you expected.

Describe the solution you'd like Either detect when you are issueing 304s over and over with no ".ts" file progression and restart the stream, or detect the problem at the rtsp level and in either case restart the stream automatically.

Describe alternatives you've considered I've considered writing a simple tool that runs on the server w/AMS that would loop through all streams that are currently broadcasting, perform m3u8 request against the stream, check for presence of 304, continue requesting until a 200 is returned, do this for two cycles so that you can compare the m3u8 contents and if the ts file's described within; match (which they should not!) then issue a local "stop broadcasting" followed by a "Start Broadcasting"

Additional context none

alfred-stokespace avatar Aug 07 '24 14:08 alfred-stokespace

Hi @alfred-stokespace, Thanks for reporting the issue. Could you please check the /usr/local/antmedia/webapps/{APP_NAME}/streams folder when you encounter the issue? I am wondering if new ts files are created or not. Also logs which belongs to that time can help us.

burak-58 avatar Aug 11 '24 16:08 burak-58

Hi @alfred-stokespace ,

Just let me add something. We've encountered a similar problem and my colleague(@lastpeony) is working on them. Here is the issue https://github.com/ant-media/Ant-Media-Server/issues/6486

@burak-58 thought that it may be the same problem.(This is my guess)

Let us know if you think it looks like the same problem as this issue https://github.com/ant-media/Ant-Media-Server/issues/6486 @alfred-stokespace

Cheers

mekya avatar Aug 12 '24 08:08 mekya

@burak-58 great suggestion about checking that folder. I will definitely check.

Providing logs is tricky for my org. We operate in a high security posture, which doesn't exclude providing logs, but logs have to be heavily scrubbed to prevent showing to much about our internals.

What would be helpful for me is to know which specific log file and partial line matches you think would be helpful for diagnosis of this specific issue, that way I don't have to spend my time scrubbing a bunch of content that turns out not to be relevant.

Hi @alfred-stokespace, Thanks for reporting the issue. Could you please check the /usr/local/antmedia/webapps/{APP_NAME}/streams folder when you encounter the issue? I am wondering if new ts files are created or not. Also logs which belongs to that time can help us.

alfred-stokespace avatar Aug 12 '24 13:08 alfred-stokespace

Hi @alfred-stokespace, Thank you for your response.

Take a look at the logs(/usr/loca/antmedia/log/ant-media-server.log). It should have lots of

cannot write frame ...

Or log/ant_media-error.log should have lots of the following logs

Application provided invalid, non monotonically increasing dts to muxer in...

Let us know if you see the logs above

Cheers

mekya avatar Aug 12 '24 14:08 mekya

@mekya I looked at your issue and checked my error log to see if I could see those non monotonically increasing messages, yes! they are there, but, they don't coincide in log files that would match up with the event that I opened this ticket for...

also, if you look below you'll notice the message is slightly different. non monotonically in my v2.9.0 logs versus non-monotonically in your version. Perhaps not relevant?

For Ant Media Support, it sure would be good to add some context to the error log so we know what stream and what time errors happen.

/var/log/antmedia# zgrep "non monotonically increasing" antmedia-error.log*
antmedia-error.log:[hls @ 0x74125890ca80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 7275643201 >= -9223372022303489409
antmedia-error.log:[hls @ 0x74125890ca80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 7275661200 >= -9223372022303453410
antmedia-error.log:[hls @ 0x74125890ca80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 7275686401 >= -9223372022303403009
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= -9223372035593544164
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630560116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630563117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630566118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630569119
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630572117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630575118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630578118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630581116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630584117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630587118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630590116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630593117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630596118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630599118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630602117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630605117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630608118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630611116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630614117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630617118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630620116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630623117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630626117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630629118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630632116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630635117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630638118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630641116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630644117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630647117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630650116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630653116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630656117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630659118
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630662116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630665117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630668117
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630671116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 630674526 >= 630674116
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 2875996515 >= -9223372031102905665
antmedia-error.log.5.gz:[hls @ 0x741258e4bfc0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 2875996515 >= 2875876626

alfred-stokespace avatar Aug 12 '24 14:08 alfred-stokespace

@mekya Looks like AMS deletes logs for ant-media-server.* after just a couple days, so my event is gone. I do see some instances below (stream-id redacted), and it was just those three occurrences.

zgrep "cannot write" ant-media-server.*

ant-media-server.1.log.gz:2024-08-12 04:03:26,780 [Thread-30578] WARN  io.antmedia.muxer.HLSMuxer - cannot write video frame to muxer(REDACTED.m3u8) av_bsf_receive_packet. Erroris Invalid argument
ant-media-server.1.log.gz:2024-08-12 04:03:26,790 [Thread-30578] WARN  io.antmedia.muxer.HLSMuxer - cannot write video frame to muxer(REDACTED.m3u8) av_bsf_receive_packet. Erroris Invalid argument
ant-media-server.1.log.gz:2024-08-12 04:03:26,881 [Thread-30578] WARN  io.antmedia.muxer.HLSMuxer - cannot write video frame to muxer(REDACTED.m3u8) av_bsf_receive_packet. Erroris Invalid argument

alfred-stokespace avatar Aug 12 '24 14:08 alfred-stokespace

Hi @alfred-stokespace, Thank you for the message.

Looks like AMS deletes logs for ant-media-server.* after just a couple days, so my event is gone.

It can be configured through logback.xml at /usr/local/antmedia/conf/logback.xml. You can increase the maxIndex and/or you can increase the maxFileSize

    <appender class="ch.qos.logback.core.rolling.RollingFileAppender" name="FILE">
        <File>/var/log/antmedia/ant-media-server.log</File>
        <Append>false</Append>

        <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
          <fileNamePattern>/var/log/antmedia/ant-media-server.%i.log.gz</fileNamePattern>
          <minIndex>1</minIndex>
          <maxIndex>3</maxIndex>
        </rollingPolicy>
        <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
          <maxFileSize>50MB</maxFileSize> 
        </triggeringPolicy>
        <encoder>
            <pattern>%d{ISO8601} [%thread] %-5level %logger{35} - %msg%n</pattern>
        </encoder>
    </appender>

also, if you look below you'll notice the message is slightly different. non monotonically in my v2.9.0 logs versus non-monotonically in your version. Perhaps not relevant?

No worries about this slight difference. I think confirming these logs are happening when there is a problem .

After getting these information. ,t looks to me that the problem is between IP Camera and AMS(because RTSP packets of sizes (769, 385, 384)) I think you're right about detecting a "stuck" stream.

To make the implementation, we need to confirm the case. Do you help us in deploying some snapshot that includes extra logs to let us understand if the stream is stuck?

If yes, let me know. I can check when we can put this to our pipeline.

Cheers Oguz

mekya avatar Aug 12 '24 15:08 mekya

Good to know @mekya on the logs.

I'm open to trying a snapshot version, however, this is a production system here...

I guess what I could do is setup a totally new server, use an additional license (I think I do have one that I'm not using atm) and set this camera alone on it, that would help isolate logs. I still would need this camera for production however, that just means it's getting RTSP pulled from two AMS instance, shouldn't mater to much.

At the moment my priority is on another project, but I might be able to interleve this effort on the side. Sure send me info on a snapshot download. I can also setup a special full packet capture on it as well and just give it a huge disk to record every packet of RTSP.

alfred-stokespace avatar Aug 12 '24 15:08 alfred-stokespace

@mekya I have a server in place and I'm ready to test. Currently I'm just running v2.9.0 like before. So far the same Axis camera hasn't failed.

alfred-stokespace avatar Aug 13 '24 16:08 alfred-stokespace

Agent Mohit Dubey linked Freshdesk ticket 131847 for this issue.

muratugureminoglu avatar Aug 14 '24 09:08 muratugureminoglu

@mekya I have a server in place and I'm ready to test. Currently I'm just running v2.9.0 like before. So far the same Axis camera hasn't failed.

Thank you for the update @alfred-stokespace. I can check if there is any change between 2.9.0 and 2.10 about IP camera streaming. Anyway, please keep us updated. We're happy to find the root cause. I think we can send a quick snapshot soon, including some logs that will help with debuging the problem.

Cheers

mekya avatar Aug 16 '24 13:08 mekya