jicofo Re-invite failed when Jicofo select another JVB after one crash

Hello,

I have a setup using multiple JVB, when a JVB crash/disconnect, Jicofo will try to select another JVB for those ongoing conference and send a new offer to the peer. But each the client decline this one: xmpp.AbstractOperationSetJingle.replaceTransport().273 RE-INVITE PEER ... jitsiMeetConferenceImpl.log() Participant has rejected our transport offer

In the client side (vanilla jitsi-meet app), i get this error:

Error renegotiating after setting new remote offer: setRemoteDescription failed: OperationError: Failed to set remote offer sdp: The order of m-lines in subsequent offer doesn't match order from previous offer/answer

Any idea how to fix this issue?

Thanks a lot

Apr 30 '18 01:04 Ben-cos

I'm planning on taking a look at this soon (after some current tasks), but if you have a chance to repro you might be able to speed things along. If you have time, mind repro'ing the issue and capturing the event history for the client who sees the error from webrtc-internals? I think a webrtc dump will capture all this, but the main part is to get all the remote offers so I can see the differences between the previous one and the one that causes the error you pasted.

May 21 '18 20:05 bbaldino

I suspect that this line is the problem: https://github.com/jitsi/lib-jitsi-meet/blob/master/modules/xmpp/JingleSessionPC.js#L968 It was a workaround to re-establish the data channels, but probably something has changed in Chrome and now it doesn't allow such thing. It could also be that the data channels will work without this hack.

May 22 '18 13:05 paweldomas

We applied the above #755 temporary fix to our lib-jitsi-meet build (stable/jitsi-meet_2988) and the call conference worked fine with two videobridges during the failover scenario (one Jicofo, one Jibri, two JVB's.) However, when Jibri (dated 2018-05-09) was doing the recording, we shutdown the active JVB, the 2nd JVB seemed resuming the call conference, but the web clients got a pop-up saying "connectivity problem and reconnecting..." and the screen was frozen. There were no errors in the JS console. However, there were some errors in Jibri log and browser log without detail information:

2018-08-02 18:56:43.727 SEVERE: [37] org.jitsi.jibri.capture.ffmpeg.executor.impl.AbstractFfmpegExecutor.isHealthy() Ffmpeg is running but doesn't appear to be encoding: 2018-08-02 18:56:44.728 SEVERE: [37] org.jitsi.jibri.capture.ffmpeg.executor.impl.AbstractFfmpegExecutor.isHealthy() Ffmpeg is running but doesn't appear to be encoding: Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1280x720, q=-1--1, 30 fps, 15360 tbn, 30 tbc

2018-08-02 19:02:43.774 INFO: [135] browser.leaveCallAndQuitBrowser() [2018-08-02T19:02:28+0000] [SEVERE] https://jitsi2.magnet.com/libs/lib-jitsi-meet.min.js?v=2740 1:300388 "[JitsiMeetJS.js]" 2018-08-02 19:02:43.775 INFO: [135] browser.leaveCallAndQuitBrowser() [2018-08-02T19:02:28+0000] [SEVERE] https://jitsi2.magnet.com/libs/app.bundle.min.js?v=2740 1:312125 Uncaught DOMException: play() failed because the user didn't interact with the document first. https://goo.gl/xX8pDD 2018-08-02 19:02:43.775 INFO: [135] browser.leaveCallAndQuitBrowser() [2018-08-02T19:02:28+0000] [SEVERE] https://jitsi2.magnet.com/libs/lib-jitsi-meet.min.js?v=2740 1:300388 "[modules/connectivity/ParticipantConnectionStatus.js]"

Has anyone encountered a similar problem when Jibri is involved in a conference?

Aug 02 '18 19:08 vincentwlau

Those Jibri logs look harmless, though I'd be able to tell more with the full log.

2018-08-02 18:56:43.727 SEVERE: [37] org.jitsi.jibri.capture.ffmpeg.executor.impl.AbstractFfmpegExecutor.isHealthy() Ffmpeg is running but doesn't appear to be encoding: 2018-08-02 18:56:44.728 SEVERE: [37] org.jitsi.jibri.capture.ffmpeg.executor.impl.AbstractFfmpegExecutor.isHealthy() Ffmpeg is running but doesn't appear to be encoding: Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1280x720, q=-1--1, 30 fps, 15360 tbn, 30 tbc

Sometimes Ffmpeg spits out diagnostic lines in between the encoding lines in the log. If the health check reads the log when one of them is the last line it will print this, but it should recover once it sees an encoding line again.

2018-08-02 19:02:43.774 INFO: [135] browser.leaveCallAndQuitBrowser() [2018-08-02T19:02:28+0000] [SEVERE] https://jitsi2.magnet.com/libs/lib-jitsi-meet.min.js?v=2740 1:300388 "[JitsiMeetJS.js]" 2018-08-02 19:02:43.775 INFO: [135] browser.leaveCallAndQuitBrowser() [2018-08-02T19:02:28+0000] [SEVERE] https://jitsi2.magnet.com/libs/app.bundle.min.js?v=2740 1:312125 Uncaught DOMException: play() failed because the user didn't interact with the document first. https://goo.gl/xX8pDD 2018-08-02 19:02:43.775 INFO: [135] browser.leaveCallAndQuitBrowser() [2018-08-02T19:02:28+0000] [SEVERE] https://jitsi2.magnet.com/libs/lib-jitsi-meet.min.js?v=2740 1:300388 "[modules/connectivity/ParticipantConnectionStatus.js]"

This isn't Jibri specific but related to a new chrome feature. We haven't seen any problems arise from this, but one possibility is that the recording/live stream wouldn't have any audio, was that the case?

Aug 02 '18 20:08 bbaldino

@bbaldino The recording should have audio and video streams (only the speakers were turned off.) I reran our test scenario (after the clients were hung, the clients manually left the conference) and generated a set of full logs: browser.0.txt, ffmpeg.0.txt, log.0.txt. They can be accessed at this server. Let me know if there are other logs that you would like to see. Thanks for your help.

Aug 03 '18 00:08 vincentwlau

Nothing jumps out from the Jibri logs, it shut down fine due to being in an empty call (it didn't detect any other participants in the call so it leaves cleanly). Whatever the issue is with the web clients not reconnecting, it seems odd Jibri would cause it. Maybe something on Jicofo chokes if there's a Jibri involved when doing failover?

Aug 03 '18 01:08 bbaldino

@bbaldino Thank you for the hint. I checked the Jicofo log and found a suspicious error that Jicofo could not invite the "recorder":

Jicofo 2018-08-03 06:49:05.420 FINE: [84] org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.logRequest().548 Expire peer channels
<iq to='jitsi-videobridge.jitsi2.magnet.com' id='11xKG-578' type='set'>
<conference xmlns='http://jitsi.org/protocol/colibri' id='3491900ccd5cae7b' gid='ff2dac' name='nbc'>
<content name='audio'>
<channel expire='0' id='891af8d6103810b9'/>
</content>
<content name='video'>
<channel expire='0' id='b4b5bc087c376b0'/>
</content>
<content name='data'>
<sctpconnection endpoint='recorder-33af4d' expire='0' id='f3f077c053354da1' port='5000'/>
</content>
</conference>
</iq>
Jicofo 2018-08-03 06:49:05.421 INFO: [84] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.sendStanza().758 @@@[SEND]: <iq to='jitsi-videobridge.jitsi2.magnet.com' id='11xKG-578' type='set'><conference xmlns='h
ttp://jitsi.org/protocol/colibri' id='3491900ccd5cae7b' gid='ff2dac' name='nbc'><content name='audio'><channel expire='0' id='891af8d6103810b9'/></content><content name='video'><channel expire='0' id='b4b5bc087c
376b0'/></content><content name='data'><sctpconnection endpoint='recorder-33af4d' expire='0' id='f3f077c053354da1' port='5000'/></content></conference></iq>
Jicofo 2018-08-03 06:49:05.438 SEVERE: [84] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.

However, the Discovered items did show that there was a second videobridge. I will compare the log when there is no Jibri involved.

Aug 03 '18 07:08 vincentwlau

@bbaldino After more testings, I narrowed down further. I had two web clients and two JVB's, but I disabled P2P in config.js. When the two web clients were in a normal call conference with audio/video (no recording involved), then I shutdown the active JVB. From the 2nd JVB log, a lot of logs were spitting out (but no errors):

JVB 2018-08-04 00:03:50.682 FINE: [112] org.jitsi.videobridge.xmpp.ComponentImpl.processIQRequest() (serving component 'JitsiVideobridge') Processing IQ request (packetId rUvt7-282).
JVB 2018-08-04 00:03:50.682 FINE: [112] org.jitsi.videobridge.xmpp.ComponentImpl.processIQ() (serving component 'JitsiVideobridge') Responding to IQ (packetId rUvt7-282) with: <iq type="result" id="rUvt7-282" from="jitsi-videobridge.jitsi2.magnet.com" to="[email protected]/focus-10.138.0.2"><query xmlns="http://jabber.org/protocol/disco#info"><identity category="component" type="conference" name="JitsiVideobridge"/><feature var="http://jabber.org/protocol/disco#info"/><feature var="urn:xmpp:ping"/><feature var="jabber:iq:last"/><feature var="urn:xmpp:time"/><feature var="http://jitsi.org/protocol/colibri"/><feature var="http://jitsi.org/protocol/healthcheck"/><feature var="urn:xmpp:jingle:apps:dtls:0"/><feature var="urn:xmpp:jingle:transports:ice-udp:1"/><feature var="urn:xmpp:jingle:transports:raw-udp:1"/><feature var="jabber:iq:version"/></query></iq>
JVB 2018-08-04 00:03:50.705 INFO: [444] org.jitsi.impl.neomedia.rtp.FrameDesc.log() keyframe,stream=1134283093 ssrc=3898281710,idx=0,ts=1611541306,indepe
ndent=true,min_seen=16404,max_seen=16404,start=16404,end=-1
JVB 2018-08-04 00:03:50.705 INFO: [444] org.jitsi.impl.neomedia.rtp.FrameDesc.log() keyframe,stream=1134283093 ssrc=2748453982,idx=3,ts=3675696119,indepe
ndent=true,min_seen=22511,max_seen=22511,start=22511,end=-1
JVB 2018-08-04 00:03:50.711 INFO: [444] org.jitsi.impl.neomedia.rtp.FrameDesc.log() keyframe,stream=1134283093 ssrc=4061529578,idx=6,ts=3345558456,indepe
ndent=true,min_seen=26979,max_seen=26979,start=26979,end=-1
JVB 2018-08-04 00:03:50.787 INFO: [441] org.jitsi.impl.neomedia.rtp.translator.RTCPFeedbackMessageSender.log() Sending a FIR to ssrc=1294880968 remaining Retries=9

but the two web clients were frozen with "connectivity problem, reconnecting..." pop-up. There were no errors in the web console. However, in some rare cases, one of the web clients was able to send its video stream to the other client, but never both clients were able to recover. Once the web clients left the conference and joined again manually, the new session started fine. Any ideas?

Aug 04 '18 02:08 vincentwlau

So is failover without Jibri working consistently? Or no?

Aug 04 '18 16:08 bbaldino

From what I observed, the failover is not working consistently even without Jibri. The problem does not seem to be in Jicofo, Jibri or JVB, but in the web client codes. I disabled P2P in config.js just to force the conference to use JVB. In one of my tests, I had P2P enabled and I had Jibri involved so the streams had to go through JVB. During the failover, Jibri recorded the "connectivity problem, reconnecting..." pop-up and one web client's streams.

BTW, I am using the stable jitsi build 2988 with the above patch in the web client.

Aug 04 '18 18:08 vincentwlau

Just another data point. The JVB failover consistently does not work with 3 participants in a call conference (i.e. p2p enabled in config.js, without recording)

Aug 05 '18 01:08 vincentwlau

More information on my test. Disabled P2P and had two web clients in a call conference and ran "ifstat -t" to monitor the network I/O on the web clients and the JVB servers. After shutting down the active JVB, I noticed that the web clients sent the "transport-accept" with something like:

m=audio... c=IN IP4 192.168.1.15 ...
m=video... c=IN IP4 0.0.0.0 ... 
m=application... c=IN IP4 0.0.0.0 ...

However, the "ifstat -t" on the web clients showed that they were sending out packets, but not many packets received. And the second JVB started receiving packets but not sending many packets. Does the IP 0.0.0.0 mean that video and data are not accepted?

Aug 07 '18 00:08 vincentwlau

@bgrozev any ideas there for the bridge failover?

Aug 07 '18 02:08 bbaldino

@bgrozev Can the effect of bridge failover actually come from the "frozen video" in the web clients? From what I observed, the web clients both had the frozen video, but the audio channel was working.

Aug 08 '18 00:08 vincentwlau

I just ran a quick test. The conference migrates to a new bridge and audio continues to work, but the video is not shown and the remote statistics are not updated. My guess is that we are not properly re-initializing something in the client.

Aug 08 '18 19:08 bgrozev

@bgrozev Sometimes the video would be alive again suddenly after some random interval. Not sure if this information may help.

Update: Do you have an estimation when it will be fixed? If you can point out the areas where the re-initialization might not be done properly, I can look into it as well.

Aug 09 '18 08:08 vincentwlau

@bgrozev sounds like the data channel is not working (if no remote stats). Frozen video could be because no selected/pinned endpoint is signalled.

Aug 22 '18 19:08 paweldomas

Still seeing this issue with jitsi/lib-jitsi-meet@2080e81. We are on 1.0.3548-1. Do you know if any change after that may fix this?

The test we did:

call with c1 <-> jvb1, c2 <-> jvb2
stop jvb1
saw RE-INVITE and transport-accept in jicofo log
c2 -> c1 recovered, i.e. c1 can see and hear c2 again
however on c2, c1's video is frozen. but c2 can hear from c1.
after a while (a little bit more than 1 minute), c2 got notification that c1 has left and c2's video on c1 side is frozen.

The log from jvb2 around the time of step 6 (attached below) shows something related to VideobridgeExpireThread. My wild guess is that c1 used to be an octo endpoint in jvb2. Later when jvb1 was down, c1 was migrated to jvb2 with the same endpoint id but this time as an endpoint directly connected with jvb2. So later when jvb2 realized jvb1 is down via VideobridgeExpireThread, jvb2 tried to clean up anything related to jvb1 and accidentally expired c1 end point, which is previously associated with jvb1. Channel.DEFAULT_EXPIRE is 60s which matches what I saw in step 6).

However I don't know why c2 can't see c1 in step 5).

Any thoughts?

jvb2 log: JVB 2019-12-12 10:38:24.853 INFO: [18] org.jitsi.videobridge.VideobridgeExpireThread.log() Running expire() JVB 2019-12-12 10:38:24.856 INFO: [4204] net.sf.fmj.media.Log.info() Stopping RTPSourceStream. JVB 2019-12-12 10:38:24.859 INFO: [4203] org.jitsi.videobridge.Channel.log() CAT=stat expire_ch,conf_id=1acdc228983c9d86,content=audio,ch_id=615b78ea2b9c5002,endp_id=null,stream=139246510 JVB 2019-12-12 10:38:24.860 INFO: [4203] org.jitsi.videobridge.RtpChannel.log() CAT=stat expire_ch_stats,conf_id=1acdc228983c9d86,content=audio,ch_id=615b78ea2b9c5002,endp_id=null,stream=139246510 bRecv=216,bSent=12499,pRecv=4,pSent=152,bRetr=0,bNotRetr=0,pRetr=0,pNotRetr=0,pMiss=0 JVB 2019-12-12 10:38:24.861 SEVERE: [4203] org.jitsi.videobridge.octo.OctoEndpoints.log() Replacing an existing audio channel JVB 2019-12-12 10:38:24.861 INFO: [4204] org.jitsi.impl.neomedia.rtp.RawPacketCache.log() CAT=stat closed,stream=957255761 max_size_bytes=168383,max_size_packets=136,total_hits=0,total_misses=0,total_packets=5773,oldest_hit_ms=0 JVB 2019-12-12 10:38:24.861 INFO: [4204] org.jitsi.impl.neomedia.rtp.RawPacketCache.log() CAT=stat closed,stream=-1 max_size_bytes=147070,max_size_packets=118,total_hits=0,total_misses=0,total_packets=858,oldest_hit_ms=0 JVB 2019-12-12 10:38:24.862 INFO: [4204] org.jitsi.videobridge.Channel.log() CAT=stat expire_ch,conf_id=1acdc228983c9d86,content=video,ch_id=142c805264610661,endp_id=null,stream=957255761 JVB 2019-12-12 10:38:24.862 INFO: [4204] org.jitsi.videobridge.RtpChannel.log() CAT=stat expire_ch_stats,conf_id=1acdc228983c9d86,content=video,ch_id=142c805264610661,endp_id=null,stream=957255761 bRecv=712153,bSent=5885799,pRecv=2006,pSent=6267,bRetr=0,bNotRetr=0,pRetr=0,pNotRetr=0,pMiss=0 JVB 2019-12-12 10:38:24.862 SEVERE: [4204] org.jitsi.videobridge.octo.OctoEndpoints.log() Replacing an existing video channel

Dec 12 '19 19:12 rhwu

Looks like this commit may have fixed the issue: https://github.com/jitsi/jitsi-videobridge/commit/0f7616ed401d17753044c5070a2a60c4786dcec2

    final AbstractEndpoint existingEndpoint = getEndpoint(id);
    if (existingEndpoint instanceof OctoEndpoint)
    {
        // It is possible that an Endpoint was migrated from another bridge
        // in the conference to this one, and the sources lists (which
        // implicitly signal the Octo endpoints in the conference) haven't
        // been updated yet. We'll force the Octo endpoint to expire and
        // we'll continue with the creation of a new local Endpoint for the
        // participant.
        existingEndpoint.expire();
    }

Dec 12 '19 20:12 rhwu

This is probably fixed now, reopen if necessary

Jan 04 '23 20:01 bgrozev

jicofo jicofo copied to clipboard

Re-invite failed when Jicofo select another JVB after one crash

jicofo
jicofo copied to clipboard