jitsi-videobridge
jitsi-videobridge copied to clipboard
JVB crashes: "Sctp send error: : Resource temporarily unavailable"
When using the DataChannel with high load (about 10 messages per second each datachannel) jitsi sometimes crashes (looks like inside the native sctp lib)
2016-09-19 09:31:45.152 Sctp send error: : Resource temporarily unavailable 2016-09-19 09:31:45.152 JVB 2016-09-19 00:31:45.152 SEVERE: [3156] org.jitsi.videobridge.Conference.error() Failed to send message on data channel. 2016-09-19 09:31:45.152 java.io.IOException: Failed to send the data 2016-09-19 09:31:45.152 at org.jitsi.videobridge.WebRtcDataStream.sendString(WebRtcDataStream.java:146) 2016-09-19 09:31:45.153 at org.jitsi.videobridge.Endpoint.sendMessageOnDataChannel(Endpoint.java:951) 2016-09-19 09:31:45.153 at org.jitsi.videobridge.Conference.sendMessageOnDataChannels(Conference.java:237) 2016-09-19 09:31:45.153 at org.jitsi.videobridge.Endpoint.onStringData(Endpoint.java:824) 2016-09-19 09:31:45.153 at org.jitsi.videobridge.WebRtcDataStream.onStringMsg(WebRtcDataStream.java:122) 2016-09-19 09:31:45.153 at org.jitsi.videobridge.SctpConnection.onSctpPacket(SctpConnection.java:825) 2016-09-19 09:31:45.153 at org.jitsi.sctp4j.SctpSocket.onSctpIn(SctpSocket.java:517) 2016-09-19 09:31:45.153 at org.jitsi.sctp4j.SctpSocket.onSctpInboundPacket(SctpSocket.java:542) 2016-09-19 09:31:45.154 at org.jitsi.sctp4j.Sctp.onSctpInboundPacket(Sctp.java:230) 2016-09-19 09:31:45.154 at org.jitsi.sctp4j.Sctp.on_network_in(Native Method) 2016-09-19 09:31:45.154 at org.jitsi.sctp4j.Sctp.onConnIn(Sctp.java:203) 2016-09-19 09:31:45.154 at org.jitsi.sctp4j.SctpSocket.onConnIn(SctpSocket.java:479) 2016-09-19 09:31:45.154 at org.jitsi.videobridge.SctpConnection.runOnDtlsTransport(SctpConnection.java:1150) 2016-09-19 09:31:45.154 at org.jitsi.videobridge.SctpConnection.access$100(SctpConnection.java:55) 2016-09-19 09:31:45.155 at org.jitsi.videobridge.SctpConnection$1.run(SctpConnection.java:475) 2016-09-19 09:31:45.155 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2016-09-19 09:31:45.155 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2016-09-19 09:31:45.155 at java.lang.Thread.run(Thread.java:745)
Do you have any scenario to reproduce?
Even im facing the same issue
SEVERE: Failed to send message on data channel. java.io.IOException: Failed to send the data at org.jitsi.videobridge.WebRtcDataStream.sendString(WebRtcDataStream.java:146) at org.jitsi.videobridge.Endpoint.sendMessageOnDataChannel(Endpoint.java:857) at org.jitsi.videobridge.Conference.sendMessageOnDataChannels(Conference.java:237) at org.jitsi.videobridge.Endpoint.onClientEndpointMessage(Endpoint.java:495) at org.jitsi.videobridge.Endpoint.onJSONData(Endpoint.java:444) at org.jitsi.videobridge.Endpoint.onStringData(Endpoint.java:744) at org.jitsi.videobridge.WebRtcDataStream.onStringMsg(WebRtcDataStream.java:122) at org.jitsi.videobridge.SctpConnection.onSctpPacket(SctpConnection.java:775) at org.jitsi.sctp4j.SctpSocket.onSctpIn(SctpSocket.java:517) at org.jitsi.sctp4j.SctpSocket.onSctpInboundPacket(SctpSocket.java:542) at org.jitsi.sctp4j.Sctp.onSctpInboundPacket(Sctp.java:230) at org.jitsi.sctp4j.Sctp.on_network_in(Native Method) at org.jitsi.sctp4j.Sctp.onConnIn(Sctp.java:203) at org.jitsi.sctp4j.SctpSocket.onConnIn(SctpSocket.java:479) at org.jitsi.videobridge.SctpConnection.runOnDtlsTransport(SctpConnection.java:1092) at org.jitsi.videobridge.SctpConnection.access$000(SctpConnection.java:53) at org.jitsi.videobridge.SctpConnection$1.run(SctpConnection.java:442) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
The reason for this error is that SCTP's send buffer is overfilled (data is queued faster than it's able to send it). I would recommend lowering the transfer rate, until we come up with some solution.
@Stefan1oo by "crash" do you mean the failure to send that you quoted, or an actual crash of the jvm? We repro the former with high load if we introduce packet loss. We've seen the latter and are looking for a solution, but we don't have a way to reproduce -- do you?
We are also able to reproduce the send error. I believe its because jitsi is unable to clear out the sctp buffer quickly enough, and gets backlogged. However, you need to be sending data channel packets at a high rate to see it.
We also see a hard crash of the VM, which we have confirmed is in the usrsctp library. We don't know if this is related to the send error, but the send error always accompanies crashes. Unfortunately I haven't been able to reproduce the crash, it only happens in our production environment.
We've scaled back our usage of the data channel because of the crash, and that has indeed resulted in fewer crashes. I speculate that if a lot of sctp connections pile up in the "unready" state, it make provoke a crash, unfortunately I've not been able to test that, so its just a guess at this point.
I corresponded with a maintainer of usrsctp about it, but he did not have any specific advice other than consider upgrading usrsctp. (https://github.com/sctplab/usrsctp/issues/105)
That's pretty much where were at, too. Although recently we've been seeing crashes much more frequently (we suspect the trigger is a change in jitsi-videobridge between 840 and 852). We hope that pulling in this PR might fix it, but it would be nice to have a way to confirm: https://github.com/sctplab/usrsctp/pull/93
Also interesting to note: not all crashes we see are in sctp_add_to_readq().
I've also seen crashes that just display "memmove" as the crash point. Before we reduced our data channel usage, we could go for days without a crash, then later we would get a series of them, 1-3 per day. I spent a lot of time trying to reproduce it, but ultimately it was de-prioritized after we reduced the usage. Even now, with much lower data channel use, we still see a crash every so often - we had 1 in last 15 days. We see the "resource temporarily unavailable" message about 25K times per day during weekdays. Most of those are probably duplicates for a smaller set of users, though.
Similar thing here:
java.lang.ClassCastException: java.util.Collections$SingletonSet cannot be cast to java.util.List
at org.jitsi.videobridge.VideoChannel.propertyChange(VideoChannel.java:593)
at org.jitsi.util.event.PropertyChangeNotifier.firePropertyChange(PropertyChangeNotifier.java:126)
at org.jitsi.videobridge.Endpoint.pinnedEndpointsChanged(Endpoint.java:627)
at org.jitsi.videobridge.Endpoint.onPinnedEndpointChangedEvent(Endpoint.java:574)
at org.jitsi.videobridge.Endpoint.onJSONData(Endpoint.java:464)
at org.jitsi.videobridge.Endpoint.onStringData(Endpoint.java:735)
at org.jitsi.videobridge.WebRtcDataStream.onStringMsg(WebRtcDataStream.java:122)
at org.jitsi.videobridge.SctpConnection.onSctpPacket(SctpConnection.java:835)
at org.jitsi.sctp4j.SctpSocket.onSctpIn(SctpSocket.java:517)
at org.jitsi.sctp4j.SctpSocket.onSctpInboundPacket(SctpSocket.java:546)
at org.jitsi.sctp4j.Sctp.onSctpInboundPacket(Sctp.java:231)
at org.jitsi.sctp4j.Sctp.on_network_in(Native Method)
at org.jitsi.sctp4j.Sctp.onConnIn(Sctp.java:204)
at org.jitsi.sctp4j.SctpSocket.onConnIn(SctpSocket.java:479)
at org.jitsi.videobridge.SctpConnection.runOnDtlsTransport(SctpConnection.java:1167)
at org.jitsi.videobridge.SctpConnection.access$100(SctpConnection.java:56)
at org.jitsi.videobridge.SctpConnection$1.run(SctpConnection.java:466)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Running **jitsi-videobridge - 879-1**
Any workaround for this? Getting it on an up to date deployment of Jitsi under a heavy load.
Experiencing similar issue too...
A workaround is to use websockets for this channel instead of SCTP (we've been running this way for a while now).
@bbaldino How to reconfigure it?
https://github.com/jitsi/jitsi-videobridge/blob/master/doc/web-sockets.md
Confirmed, those messages have disappeared.
Hm, seems to help only temporarily. It's back now.
If you've got websockets set up correctly, it should be impossible to see that message, as the bridge won't even be using SCTP. If you attach some bridge logs we might get an indication of whether or not something is up with the websocket config. cc @bgrozev
Might be clients with cached old config.js, perhaps?
Might be clients with cached old config.js, perhaps?
That could be, I think mobile clients cache the config for a bit.
@bbaldino I followed everything as described in the link here: https://github.com/jitsi/jitsi-videobridge/blob/master/doc/web-sockets.md however it's still not using websockets. I looked in the source at https://meet.jit.si/ and noticed this:
websocket: 'wss://meet.jit.si/xmpp-websocket', // FIXME: use xep-0156 for that
Do I have to explicitly define the websocket URL in the config.js ? Because that's not present anywhere in the readme and doesn't appear to be included on my instance.
I looked in the source at https://meet.jit.si/ and noticed this:
websocket: 'wss://meet.jit.si/xmpp-websocket', // FIXME: use xep-0156 for that
Do I have to explicitly define the websocket URL in the config.js ? Because that's not present anywhere in the readme and doesn't appear to be included on my instance.
No, this is an unrelated feature (web sockets for the XMPP connection) that doesn't concern the bridge.
The only thing you need in config.js is openBridgeChannel: 'websocket'
I did that and I can see it in the config.js being served, point is it's still not advertising websockets.
In the console I can see this though:
In the source of the page being served I see this:
But I can't see any websockets being open in the network tab. Anything else which could cause websockets to not be advertised?
Anything else which could cause websockets to not be advertised?
Yes, the bridge config. Can you post that?
You mean sip-communicator.properties? If yes - here it is:
#org.jitsi.videobridge.AUTHORIZED_SOURCE_REGEXP=focus@auth.<myfqdn>/.*
org.ice4j.ice.harvest.DISABLE_AWS_HARVESTER=true
org.ice4j.ice.harvest.STUN_MAPPING_HARVESTER_ADDRESSES=meet-jit-si-turnrelay.jitsi.net:443
org.jitsi.videobridge.ENABLE_STATISTICS=true
org.jitsi.videobridge.STATISTICS_TRANSPORT=muc
org.jitsi.videobridge.xmpp.user.shard.HOSTNAME=localhost
org.jitsi.videobridge.xmpp.user.shard.DOMAIN=auth.<myfqdn>
org.jitsi.videobridge.xmpp.user.shard.USERNAME=jvb
org.jitsi.videobridge.xmpp.user.shard.PASSWORD=<scrubbed>
org.jitsi.videobridge.xmpp.user.shard.MUC_JIDS=JvbBrewery@internal.auth.<myfqdn>
org.jitsi.videobridge.xmpp.user.shard.MUC_NICKNAME=1a9ea34e-6ab7-4e6f-85a7-33bec8a63552
org.jitsi.videobridge.rest.jetty.port=9090
org.jitsi.videobridge.rest.COLIBRI_WS_TLS=true
org.jitsi.videobridge.rest.COLIBRI_WS_DOMAIN=<myfqdn>:443
org.jitsi.videobridge.rest.COLIBRI_WS_SERVER_ID=jvb1
I have an nginx in the front with the default recommended conf, just appended the stuff in the doc link above.
And I confirm that it's listening:
tcp6 0 0 :::9090 :::* LISTEN 1578/java
The config looks correct. Can you see whether the web socket is advertised in jingle? The easiest way is to open the javascript console, search for session-initiate
(if there's more than one look for the one from focus
), and expand the XML to see jingle>content>transport
. Like this:
data:image/s3,"s3://crabby-images/08ee1/08ee173c4e60f6c13ad56c121bda7fd5157098a1" alt="Screenshot 2020-03-31 at 12 12 25"
Yep, saw it, confirmed it's working now, not sure what was wrong earlier but wasn't seeing the websocket reference. Thanks a lot for the help @bgrozev .