srs icon indicating copy to clipboard operation
srs copied to clipboard

GB28181: When camera restart, can not connect to SRS.

Open daveyang05 opened this issue 1 year ago • 32 comments

Integrating Hikvision cameras using the GB28181 protocol, after the camera restarts, it takes more than two hours for the video stream to recover. Before recovery, the session status in SRS remains in the 'established' state. Approximately two hours later, the camera sends a remote 'reset' command, after which SRS disconnects the media stream, and then normal operation resumes.

daveyang05 avatar Feb 02 '24 01:02 daveyang05

"'reset' command" means tcp reset pkt over sip?

yushimeng avatar Feb 02 '24 08:02 yushimeng

Could you please clarify if GB28181 is capable of actively detecting when a media stream is disconnected and subsequently transitioning the state of the camera session to an initial state, among other potential state changes?

TRANS_BY_GPT4

daveyang05 avatar Feb 02 '24 08:02 daveyang05

2024-01-30 10:23:44.903][ERROR][1][47l067d9][104] SIP: Receive err code=1007(SocketRead)(Socket read data failed) : parse message : parse message : grow buffer : read bytes : read thread [1][47l067d9]: do_cycle() [./src/app/srs_app_gb28181.cpp:1077][errno=104] thread [1][47l067d9]: parse_message() [./src/protocol/srs_protocol_http_conn.cpp:103][errno=104] thread [1][47l067d9]: parse_message_imp() [./src/protocol/srs_protocol_http_conn.cpp:153][errno=104] thread [1][47l067d9]: grow() [./src/protocol/srs_protocol_stream.cpp:162][errno=104] thread [1][47l067d9]: read() [./src/protocol/srs_protocol_st.cpp:566][errno=104](Connection reset by peer)

Hello! The above text is a log printout from SRS. According to the log, SRS disconnects the media stream connection after receiving the "Connection reset by peer" command. Subsequently, the status transitions from established to init, at which point it can accept registration messages from the camera. Preliminary analysis suggests this is the case.

The initial suspicion is that the support for SRS to access the 28181 protocol did not detect the media stream fault.

TRANS_BY_GPT4

daveyang05 avatar Feb 02 '24 08:02 daveyang05

It looks like sip recv thread exit did not notify sip conn thread.

yushimeng avatar Feb 02 '24 09:02 yushimeng

@yushimeng This seems quite reasonable and commendable. If a thread exits abnormally, there might indeed be an issue with how this logic is handled.

TRANS_BY_GPT4

winlinvip avatar Feb 02 '24 09:02 winlinvip

@daveyang05 try this pull https://github.com/ossrs/srs/pull/3947 give me a feedback if dont work

yushimeng avatar Feb 02 '24 11:02 yushimeng

@yushimeng Nice work!

winlinvip avatar Feb 02 '24 12:02 winlinvip

Hello! The developer responsible for interfacing with the development of 28181 has taken leave due to personal matters at home. They will commence the verification process in this area immediately upon their return after the New Year, and the results will be promptly communicated to you. Additionally, could you please confirm if it is branch #3947?

TRANS_BY_GPT4

daveyang05 avatar Feb 04 '24 06:02 daveyang05

When the media connection is disconnected, the session will be directly destroyed, but when the SIP connection is disconnected, the session will not be immediately destroyed. If we follow the current approach, when a new SIP connection is connected, sip/session status recovery and authentication issues need to be handled specially. My idea is to also directly destroy the session when the SIP connection is disconnected Although I have processed the session recovery logic when SIP immediately reconnects in my current submission, I can also consider deleting this recovery logic in the future Additionally, I have added SrsResourceManager: erase to avoid bind session before session resource destruction. I am not sure if it will disrupt original desion of the Lazy sweep and resource manager.

yushimeng avatar Feb 05 '24 03:02 yushimeng

Hello! The code modification response is very fast. Is there an anomaly in the SRS resource management, and are there relevant test cases in the original design? Evaluate the impact on the original system design by regressing these test cases.

TRANS_BY_GPT4

daveyang05 avatar Feb 05 '24 08:02 daveyang05

@yushimeng, I would like to inquire: Have you not yet officially committed your code modifications to the development branch of SRS?

TRANS_BY_GPT4

daveyang05 avatar Feb 18 '24 03:02 daveyang05

The previous submission was based on my incorrect understanding of the code. Could you provide the logs and configuration so that I can further pinpoint the issue with greater accuracy?

TRANS_BY_GPT4

yushimeng avatar Feb 18 '24 03:02 yushimeng

Okay, attached is the log information from that time: (Note: The file upload for "Camera Restart Recovery Time Exceeding 2 Hours Log.zip" appears to be incomplete or pending.)

TRANS_BY_GPT4

daveyang05 avatar Feb 18 '24 03:02 daveyang05

Okay, the attachment contains the log print information from that time.

TRANS_BY_GPT4

daveyang05 avatar Feb 18 '24 03:02 daveyang05

Configuration of the camera's IP settings, GB28181 integration, TCP protocol. (Attachment: Configuration information for GB28181 camera is being uploaded...)

TRANS_BY_GPT4

daveyang05 avatar Feb 18 '24 04:02 daveyang05

@yushimeng, may I inquire about the progress of the issue resolution?

TRANS_BY_GPT4

daveyang05 avatar Feb 21 '24 04:02 daveyang05

For SIP terminal registration messages, the CSeq field can be used to determine whether the message is an initial registration or a subsequent periodic registration. This field increments with each report from the terminal. For initial registration messages, the previous session data should be initialized and the process should start anew. If it is a periodic registration message, the current session information should be retained. To differentiate between initial and periodic registration messages, the SRS should keep track of the last reported SIP message's CSeq value, which normally increases continuously. If a decrease is observed (and the last message did not reach or approach 0xffffffff), it can be inferred that the message is an initial registration.

TRANS_BY_GPT4

daveyang05 avatar Mar 05 '24 11:03 daveyang05

@daveyang05 Nice work, welcome to file a patch to fix this issue. :)

winlinvip avatar Mar 05 '24 23:03 winlinvip

10.2 Constructing the REGISTER Request Call-ID: All registrations from a UAC SHOULD use the same Call-ID header field value for registrations sent to a particular registrar.

       If the same client were to use different Call-ID values, a
       registrar could not detect whether a delayed REGISTER request
       might have arrived out of order.

  CSeq: The CSeq value guarantees proper ordering of REGISTER
       requests.  A UA MUST increment the CSeq value by one for each
       REGISTER request with the same Call-ID.

yushimeng avatar Mar 06 '24 10:03 yushimeng

SIP registration messages are typically sent at minute-level intervals. Observations from Hikvision cameras indicate that they initiate registration messages at least every 10 minutes, making the likelihood of message disorder occurring within a few minutes quite low. To determine whether a message from a camera is the initial registration or a subsequent message, one can check if the CSeq number has been reversed and if the Call-ID is the same as the previous registration message. Within the same session, the initial registration message and subsequent session messages should have the same Call-ID, as confirmed by the requirements of the SIP protocol and packet captures from Hikvision cameras.

TRANS_BY_GPT4

daveyang05 avatar Mar 07 '24 02:03 daveyang05

@Yu Gong, you can enhance your original modifications by adding appropriate logic to compare incoming SIP registration messages with previously stored session registration information. If there is a change in the Call-ID or if the CSeq number is lower than before, then clear the existing session and initiate a new SIP session. Otherwise, maintain the existing session.

TRANS_BY_GPT4

daveyang05 avatar Mar 07 '24 02:03 daveyang05

General Yang and Engineer Yu, after modifying the GB28181 code, we have tested and verified that the Hikvision cameras can quickly recover the video stream after a restart. The code changes have been made in the version 5.0 branch. srs_app_gb28181.zip

TRANS_BY_GPT4

daveyang05 avatar Mar 20 '24 04:03 daveyang05

@daveyang05 can you publish a docker image for patch it? I don't find the release package to fix it.

codeex avatar Mar 24 '24 14:03 codeex

@Yu Gong, we are currently engaged in development and validation for version 5.0. The attached Docker container has been modified and released based on that version branch.

TRANS_BY_GPT4

daveyang05 avatar Mar 25 '24 02:03 daveyang05

As mentioned above.

TRANS_BY_GPT4

daveyang05 avatar Mar 25 '24 02:03 daveyang05

@daveyang05 , I can't find branch v5.0 to compile it, what can I do to find it for docker or source code?

codeex avatar Mar 25 '24 10:03 codeex

The text appears to be a link to a downloadable ZIP file named "srs_app_gb28181 camera restart 2 hours recovery code modification.zip" hosted on the GitHub platform under the repository 'ossrs/srs'. The file name suggests that it contains modifications to the code for an application related to the GB28181 protocol, which is a Chinese national standard for video surveillance systems. The modifications might be for a feature that allows a camera to recover or restart after 2 hours.

TRANS_BY_GPT4

daveyang05 avatar Mar 26 '24 11:03 daveyang05

Code

TRANS_BY_GPT4

daveyang05 avatar Mar 26 '24 11:03 daveyang05

I downloaded the version 5.0 release branch, substituted the altered GB28181 file, and subsequently recompiled to create the image. Despite redeployment, the changes do not seem to be applied. Restarting the Hikvis srs5-disconnect.log ion camera did not resolve the issue, as it still fails to reconnect to the video stream, although the camera status indicates it is online. The cause of the problem is unclear.

TRANS_BY_GPT4

codeex avatar Mar 27 '24 08:03 codeex

srs_error_t SrsLazyGbSipTcpConn::bind_session(SrsSipMessage* msg, SrsLazyObjectWrapper<SrsLazyGbSession>** psession) { srs_error_t err = srs_success;

string device = msg->device_id();
if (device.empty()) return err;

// Only create session for REGISTER request.
if (msg->type_ != HTTP_REQUEST || msg->method_ != HTTP_REGISTER) return err;

// The lazy-sweep wrapper for this resource.
SrsLazyObjectWrapper<SrsLazyGbSipTcpConn>* wrapper = wrapper_root_;
srs_assert(wrapper); // It MUST never be NULL, because this method is in the cycle of coroutine of receiver.

// Find exists session for register, might be created by another object and still alive.
SrsLazyObjectWrapper<SrsLazyGbSession>* session = dynamic_cast<SrsLazyObjectWrapper<SrsLazyGbSession>*>(_srs_gb_manager->find_by_id(device));

// If a session is found by device ID and the current message is a registration message
**if (session && msg->is_register()) {
    // If the cseq number decreased or the call id changed
    _if (msg->cseq_number_ < register_->cseq_number_ || msg->call_id_ != register_->call_id_) {
        // Remove resource from GB manager
        _srs_gb_manager->remove(session);

        // Set session to NULL
        session = NULL;
    }
}_**

if (!session) {
    // Create new GB session.
    session = new SrsLazyObjectWrapper<SrsLazyGbSession>();

    if ((err = session->resource()->initialize(conf_)) != srs_success) {
        srs_freep(session);
        return srs_error_wrap(err, "initialize");
    }

Please verify if the bind_session function in the downloaded code contains the following code. if (session && msg->is_register()) { // If the cseq number decreased or the call id changed if (msg->cseq_number < register_->cseq_number_ || msg->call_id_ != register_->call_id_) { // Remove resource from GB manager _srs_gb_manager->remove(session);

        // Set session to NULL
        session = NULL;
    }
}_

TRANS_BY_GPT4

daveyang05 avatar Mar 27 '24 09:03 daveyang05