janus-cloud icon indicating copy to clipboard operation
janus-cloud copied to clipboard

broken pipe issue

Open agorkiy opened this issue 2 years ago • 6 comments

Hi I have setup with 1 yanus proxy and 2 janus servers behind it. It works good for all demos and test calls with 5-10 participants. Now I'm testing it, trying to get stable 3 rooms with 20 members in each - but this doesn't work for now. I see such logs in janys-proxy log file:

BrokenPipeError: [Errno 32] Broken pipe
2022-04-21 12:47:50,795 [1690018] [januscloud.proxy.core.frontend_session] [ERROR] - Asynchronous event ({'janus': 'event', 'session_id': 7324896184483668, 'plugindata': {'plugin': 'janus.plugin.videoroom
', 'data': {'videoroom': 'event', 'room': 1743988738, 'leaving': 8968357616118914}}, 'sender': 5720134667431144}) transport failed on session (id:7324896184483668)
Traceback (most recent call last):
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/frontend_session.py", line 36, in notify_event
    self.ts.send_message(event)
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/transport/ws.py", line 132, in send_message
    self.send(self._msg_encoder.encode(message), binary=False)
  File "/opt/janus-cloud/lib/python3.8/site-packages/ws4py/websocket.py", line 303, in send
    self._write(m)
  File "/opt/janus-cloud/lib/python3.8/site-packages/ws4py/websocket.py", line 285, in _write
    self.sock.sendall(b)
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/transport/ws.py", line 67, in sendall
    _sendall(data)
  File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socket3.py", line 534, in sendall
    return _socketcommon._sendall(self, data_memory, flags)
  File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socketcommon.py", line 392, in _sendall
    timeleft = __send_chunk(socket, chunk, flags, timeleft, end)
  File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socketcommon.py", line 321, in __send_chunk
    data_sent += socket.send(chunk, flags)
  File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socket3.py", line 515, in send
    return self._sock.send(data, flags)
BrokenPipeError: [Errno 32] Broken pipe

Any idea why this happen and how to fix?

agorkiy avatar Apr 21 '22 11:04 agorkiy

from your log, the reason is the websocket pipe between janus proxy and your client is broken which results into exception when we send mesage to it. janus-proxy has catch this exception and ignore it.

jamken avatar Apr 25 '22 16:04 jamken

For some high work load case, it's possible that the low level socket may be already broken but the high level business is not yet notified about that and still send some data to this socket. So, you may get the above log. I would lower the log level about that in the recent version

jamken avatar Apr 26 '22 02:04 jamken

I think this log is not the real reason make you failed, please check the cpu workload of janus-proxy

jamken avatar Apr 26 '22 02:04 jamken

@jamken Yes, I was observing cpu load as well, but didn't saw significant load. Also, about errors, I had such one too:

2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection wi
th ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection wi
th ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection wi
th ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection with ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection with ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection with ('127.0.0.1', 46960)
2022-04-20 18:10:28,441 [3858813] [januscloud.proxy.core.frontend_session] [DEBUG] - an asynchronous event messge ({'janus': 'hangup', 'session_id': 7522993418610696, 'reason': 'DTLS timeout', 'sender': 661122662160242}) is sent back asynchronous for session "7522993418610696" 
2022-04-20 18:10:28,442 [3858813] [januscloud.proxy.core.frontend_session] [DEBUG] - an asynchronous event messge ({'janus': 'hangup', 'session_id': 1882511552519992, 'reason': 'DTLS timeout', 'sender': 8338611274264920}) is sent back asynchronous for session "1882511552519992" 
2022-04-20 18:10:28,443 [3858813] [januscloud.proxy.core.frontend_session] [DEBUG] - an asynchronous event messge ({'janus': 'hangup', 'session_id': 1256170478876418, 'reason': 'DTLS timeout', 'sender': 4842149081266732}) is sent back asynchronous for session "1256170478876418" 
2022-04-20 18:10:28,444 [3858813] [januscloud.proxy.core.request] [DEBUG] - Request ({'janus': 'trickle', 'candidate': {'candidate': 'candidate:3534132064 1 udp 2113937151 16bcd6df-5dac-4eea-89e5-17af6f3032d8.local 54000 typ host generation 0 ufrag dHsY network-cost 999', 'sdpMid': '0', 'sdpMLineIndex': 0}, 'transaction': 'c8XRo1zYK557', 'session_id': 1200244565700485, 'handle_id': 6523215966388871}) is incoming to handle
2022-04-20 18:10:28,445 [3858813] [januscloud.proxy.core.frontend_session] [ERROR] - Couldn't find any session 1200244565700485
2022-04-20 18:10:28,445 [3858813] [januscloud.proxy.core.request] [WARNING] - Request ({'janus': 'trickle', 'candidate': {'candidate': 'candidate:3534132064 1 udp 2113937151 16bcd6df-5dac-4eea-89e5-17af6f3032d8.local 54000 typ host generation 0 ufrag dHsY network-cost 999', 'sdpMid': '0', 'sdpMLineIndex': 0}, 'transaction': 'c8XRo1zYK557', 'session_id': 1200244565700485, 'handle_id': 6523215966388871}) processing failed
Traceback (most recent call last):
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 295, in incoming_request
    response = handler(request)
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 267, in _handle_trickle
    handle = self._get_plugin_handle(request)
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 127, in _get_plugin_handle
    session = self._get_session(request)
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 120, in _get_session
    session = self._frontend_session_mgr.find_session(request.session_id)
  File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/frontend_session.py", line 124, in find_session
    raise JanusCloudError('No such session {}'.format(session_id), JANUS_ERROR_SESSION_NOT_FOUND)

This happens even with 2 rooms with 20 person in each - at some point log fills with Already closed: websocket server connection with ('127.0.0.1', 46960) and then participants in room start to disconnect. And then I assume happens next one, JANUS_ERROR_SESSION_NOT_FOUND, when they try to log back to the room but can't find it for some reason.

agorkiy avatar Apr 29 '22 13:04 agorkiy

I think this problem may be in your front http(websocket) proxy . the ws connections from the http proxy (like 127.0.0.1:46960) to the janus-proxy(maybe 127.0.0.1:8288) broken for some reason. Please try to bypass the http proxy and test again

jamken avatar May 02 '22 02:05 jamken

maybe the file fd number limitation? please check ulimite -a also

jamken avatar May 02 '22 02:05 jamken