owt-server icon indicating copy to clipboard operation
owt-server copied to clipboard

Timeout to make rpc to conference-7de8cd4d0d3bae78fa03@ipAddress_0.join

Open mangalpandey opened this issue 3 years ago • 8 comments

I am using OWT Server V5.0. Mongo DB, Rabbit and OWT all are on same machine

server connection failed: Error: Timeout to make rpc to [email protected]_0.join.

This error message comes every time. Login is unsuccesful. Portal send this join message but does not receive. And that is why timeout comes.

Need help, How to resolve this issue??

mangalpandey avatar May 07 '21 08:05 mangalpandey

Where did you get compiled version of OWT Server V5.0? webrtc.intel.com only gives Release-v4.3.1 version. When I try to build V5.0 but get blocked at the stage of

`running 'download_from_google_storage --no_resume --platform=linux* --no_auth --bucket chromium-clang-format -s src/buildtools/linux64/clang-format.sha1' in '/root/owt-server-4.3/owt-server/third_party/webrtc-m79' 0> Downloading src/buildtools/linux64/clang-format@942fc8b1789144b8071d3fc03ff0fcbe1cf81ac8... Downloading 1 files took 8.500975 second(s)

Running hooks: 90% (20/22) msan_chained_origins `

sunsocool avatar May 09 '21 12:05 sunsocool

I build from this https://github.com/open-webrtc-toolkit/owt-server/archive/refs/tags/v5.0.zip I was successfull without error. Running hooks: 90% --- this process takes time to finish. Please follow https://github.com/open-webrtc-toolkit/owt-server#instructions

mangalpandey avatar May 10 '21 06:05 mangalpandey

Thx , the procedure just takes too much time and using too many memory. When I tested Release-v4.3.1 version I found a lot problems and hard to make it work. I am reading the whole source code to figure out how and why of the pipeline.

By the way, almost all the cloud server provider only using Intel CPU without Intel's GPU, and it is a problem to use Intel Media SDK to get hardware acceleration support.

sunsocool avatar May 10 '21 11:05 sunsocool

I got the same issue as you did and don't know what to do with it. Every time when I tried to use the demo by 3004 port, I got the error "server connection failed: Error: Timeout to make rpc to [email protected]", so could you plz tell me how you solved it?

Uthergogogo avatar Nov 08 '21 08:11 Uthergogogo

I also have a similar issue, if anyone knows how to solve it, I am interested.

sebsken avatar Aug 01 '22 16:08 sebsken

Please check the file logs/conference-7de8cd4d0d3bae78fa03@ipAddress_0.log to see if there were any errors.

starwarfan avatar Aug 02 '22 01:08 starwarfan

Hello @starwarfan thanks for the advice. I did look into the logs but could not make much out of it. I attach the logs as well as my toml configuration. toml.zip logs.zip

There are a few timeouts in there, the root cause seems to be a node lost. I also looked in the logs from the cluster manager but could not find any indication of a failure there. Here is the first failure excerpt from the conference logs. `2022-08-01 15:19:52.846 - DEBUG: AmqpClient - remoteCall, corrID: 6 to: [email protected]_0 method: enableVAD

...

2022-08-01 15:19:54.501 - DEBUG: AmqpClient - remoteCall, corrID: 10 to: [email protected]_0 method: onTransportSignaling 2022-08-01 15:19:54.846 - DEBUG: AmqpClient - remoteCall timeout, corrID: 6 2022-08-01 15:19:55.681 - DEBUG: AmqpClient - received monitoring message: { reason: 'abnormal', message: { purpose: 'webrtc', id: '[email protected]_0', type: 'node' } } 2022-08-01 15:19:55.681 - DEBUG: RtcController - terminateByLocality node [email protected]_0 2022-08-01 15:19:55.681 - DEBUG: RtcController - terminate, sessionId: bb557ef7fc014deabccd551b13682ac2 direction: out, Node lost 2022-08-01 15:19:55.681 - DEBUG: AmqpClient - remoteCall, corrID: 11 to: [email protected]_0 method: unsubscribe 2022-08-01 15:19:56.502 - DEBUG: AmqpClient - remoteCall timeout, corrID: 10 2022-08-01 15:19:56.502 - WARN: RtcController - Trnasport signaling RPC failed Timeout to make rpc to [email protected]_0.onTransportSignaling 2022-08-01 15:19:57.682 - DEBUG: AmqpClient - remoteCall timeout, corrID: 11 2022-08-01 15:19:57.682 - DEBUG: Conference - onSessionAborted, participantId: JttcPkM6qW_n3-WMAAAA sessionId: bb557ef7fc014deabccd551b13682ac2 direction: out reason: Node lost`

After that, it will try to clean things up and more timeouts follow.

sebsken avatar Aug 03 '22 15:08 sebsken

I have the same issue with v4.3.1. My solution is to change config:

#webrtc_agent/agent.toml
[webrtc]
network_interfaces = []  # before it was [{name = "eth0"}]
#portal/portal.toml
[portal]
ip_address = ""  # before it was my server ip address

Then rerun init and start script and it will be ok. I guess that these configs let components use public ip to do rabbitmq messaging, which causes the problem, so I'm also afraid that when deploying in multi-nodes env this problem will happen again.

GStarP avatar Nov 03 '22 09:11 GStarP