bugtracker
bugtracker copied to clipboard
KMS Client DEADLOCK - Node JS ends up blocking when using Kurento libs alongside HTTP requests (tried Axios / node-fetch)
KMS Version: (In Docker with --net=host) Module: 'core' version '6.6.3' Module: 'elements' version '6.6.3' Module: 'filters' version '6.7.0~1.g0314843'
Other libraries versions:
Paste output of executing dpkg -l | egrep -i "kurento|gst.*1.5|nice"
in the
machine where KMS is running
ii gstreamer1.5-libav:amd64 1.8.2.1.xenial~20170725171352.96.493eee4 amd64 libav plugin for GStreamer
ii gstreamer1.5-nice:amd64 0.1.13.1.xenial~20170725160546.81.eebfdab amd64 ICE library (GStreamer plugin)
ii gstreamer1.5-plugins-bad:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer plugins from the "bad" set
ii gstreamer1.5-plugins-base:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plugins from the "base" set
ii gstreamer1.5-plugins-good:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plugins from the "good" set
ii gstreamer1.5-plugins-ugly:amd64 1.8.1.1.xenial~20170725170621.89.2685b0f amd64 GStreamer plugins from the "ugly" set
ii gstreamer1.5-pulseaudio:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plugin for PulseAudio
ii gstreamer1.5-x:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plugins for X11 and Pango
ii kms-core-6.0 6.6.3.xenial.20170811182512.2704d13 amd64 Kurento core module
ii kms-elements-6.0 6.6.3.xenial.20170812234356.9e3fff4 amd64 Kurento elements module
ii kms-filters-6.0 6.7.0.xenial~20170813001324.1.0314843 amd64 Kurento filters module
ii kms-jsonrpc-1.0 1.2.0.xenial~20170811181902.1.8cee9d8 amd64 Kurento jsonrpc library
ii kmsjsoncpp 1.6.3.xenial.20170725151047.d78deb7 amd64 Kurento jsoncpp library
ii kurento-media-server-6.0 6.7.0.xenial~20170813003033.1.6ebaa27 amd64 Kurento Media Server
ii libgstreamer-plugins-bad1.5-0:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer development files for libraries from the "bad" set
ii libgstreamer-plugins-base1.5-0:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer libraries from the "base" set
ii libgstreamer1.5-0:amd64 1.8.1.1.xenial~20170725152356.170.0d6031b amd64 Core GStreamer libraries and elements
ii libnice10:amd64 0.1.13.1.xenial~20170725160546.81.eebfdab amd64 ICE library (shared library)
Client libraries
- Language: <Java, Node.js, JavaScript, homebrewed>
- Version: Node v8.4.0, KMS client libs 6.6.0
Browsers tested Add OK or FAIL, along with the version, after browsers where you have tested this issue:
- Chrome: FAIL
- Firefox: FAIL (This is a server side issue)
System description: Please describe your setup (where is KMS located, where are the clients, STUN, TURN...) Local dev setup - using public STUN.
What steps will reproduce the problem?
- Tear out some hair debugging what you assume is a bug in your own code ;-)
- Setup one2many project (Currently have this in an Alpine container on Node 8.4, kms in the latest docker container with --net=host, and coturn also with --net=host) using docker-compose but have the same issue when running one2many directly on the host
- Wrap
startPresenter(sessionId, ws, message.sdpOffer, function(error, sdpAnswer) { ... }
code with a HTTP request. I've usedaxios.get('https://google.com', function (response) { ... }
but this also fails with similar results using request_promise / node-fetch
What is the expected result? Remote HTTP request is made and startPresenter to setup WebRTC pipelines / trickle ICE and the publishing to happen
What happens instead? Node goes into deadlock after creating the WebRTC pipeline. Fails to execute processOffer. I believe the axios request is doing something internally such that the JSON RPC calls for ICE and ProcessOffer are getting blocked. However this is an internal Node deadlock. I'm not a Node programmer by trade so knowledge on the topic (other than the standard "make sure you do everything in a non blocking async manner, use Promises properly with .catch etc") is limited when it comes to the internals of Node / V8 / libuv scheduling.
It looks like the Axios request gets made successfully and called back as does the first create pipeline call, but everything after fails due to deadlock so things seem to lock up after a short period of time. It seems deterministic in nature in regard that it always stops at the same point (i.e. even the ICE negotiation / queuing get blocked despite the async nature).
Does it happen with one of the tutorials? Yes. I've used the one2many tutorial as my test case :-)
Please provide any additional information below. When the tutorials are run normally this doesn't happen. If I make a separate project and do HTTP requests like this I have no issue, so there should be no reason why this happens here.
Running strace seems to confirm some sort of FUTEX deadlock
I seem to get 100% on one of my cores and this is repeat
mmap(0x19c7e9d1000, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x19c7e9d1000
munmap(0x19c7e9d1000, 192512) = 0
munmap(0x19c7ea80000, 331776) = 0
mmap(0x19c7ea00000, 524288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x19c7ea00000
mmap(0x27061d8af000, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x27061d8af000
munmap(0x27061d8af000, 331776) = 0
munmap(0x27061d980000, 192512) = 0
mmap(0x27061d900000, 524288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x27061d900000
write(8, "\1\0\0\0\0\0\0\0", 8) = 8
getpid() = 1
mmap(0xd650398e000, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xd650398e000
munmap(0xd650398e000, 466944) = 0
munmap(0xd6503a80000, 57344) = 0
mmap(0xd6503a00000, 524288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xd6503a00000
futex(0x7f0bcded79e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdeee9e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdec09e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdea99e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
mmap(0x150e4e8c000, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x150e4e8c000
munmap(0x150e4e8c000, 475136) = 0
munmap(0x150e4f80000, 49152) = 0
mmap(0x150e4f00000, 524288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x150e4f00000
futex(0x7f0bcded79e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdeee9e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdec09e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdea99e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcded79e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdeee9e4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f0bcdef3514, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x555b208faa64, FUTEX_WAKE_PRIVATE, 1) = 1
file descriptor 8 seems I assume is something to do with the libuv even loop as ls -la /proc/1/fd yields
lr-x------ 1 root root 64 Sep 19 16:13 0 -> pipe:[2473993]
l-wx------ 1 root root 64 Sep 19 16:13 1 -> pipe:[2473994]
lrwx------ 1 root root 64 Sep 19 16:34 10 -> socket:[2471502]
lrwx------ 1 root root 64 Sep 19 16:34 11 -> socket:[2472511]
lrwx------ 1 root root 64 Sep 19 16:34 14 -> socket:[2470840]
lrwx------ 1 root root 64 Sep 19 16:34 15 -> socket:[2472528]
lrwx------ 1 root root 64 Sep 19 16:34 16 -> socket:[2472530]
lrwx------ 1 root root 64 Sep 19 16:34 17 -> socket:[2472548]
l-wx------ 1 root root 64 Sep 19 16:13 2 -> pipe:[2473995]
lr-x------ 1 root root 64 Sep 19 16:13 3 -> pipe:[2471499]
l-wx------ 1 root root 64 Sep 19 16:13 4 -> pipe:[2471499]
lrwx------ 1 root root 64 Sep 19 16:13 5 -> anon_inode:[eventpoll]
lr-x------ 1 root root 64 Sep 19 16:13 6 -> pipe:[2471500]
l-wx------ 1 root root 64 Sep 19 16:13 7 -> pipe:[2471500]
lrwx------ 1 root root 64 Sep 19 16:34 8 -> anon_inode:[eventfd]
lr-x------ 1 root root 64 Sep 19 16:34 9 -> /dev/null
Interestingly express seems to respond still to basic HTTP requests, my Kurento request is stuck though. Running Node dev tools shows that MediaObject.then
is the point of the deadlock.
I get the same issue on Node 6.x
Hi virtualfunction,
The new Kurento team is now very small (after Twilio adquisition of most of developers) and we are very busy with other issues. Moreover, we don't have any experienced Node developer in our team. For that reasons, we will not be able to solve your issue in the near future.
If you can solve the issue yourself, we will be glad to accept your Pull Request.
I will have at trying to find the cause by my knowledge of node is limited, as is my time.
If there is anyone who's able to fix this, I'm willing to negotiate a bounty price on this matter.
Oh just realised I'm logged in under another account rather than my virtualfunction account.
Hi @virtualfunction, I'm the original Kurento Node.js APIs designer and developer and I'm available for hire. Can you be able to bring me more info about your issue here or by email, so we can start working on a stimation?
@piranna - Hi, thanks for getting back, I'll PM / get in contact early next week so we can take things forward
@piranna - Just pinged you an email - I've got a private repo (it's private for now just as I've used my own TLS certs) which you should have access with docker-compose wired up to replicate this.
@piranna - I've sent you a couple of emails (24/10/17 15:35 Spanish time was the last one) .... guessing they've been diverted or lost.... Let me know if you have them. If you're a bit busy (or on holiday!) that's fine, let know know so I can at least echo that back.
@virtualfunction -- At first, may I get your confirmation that "kurento-tutorial-node/kurento-one2many-call" will be blocked when it runs into "webRtcEndpoint.processOffer(sdpOffer, function(error, sdpAnswer)" in server.js?
On my side, I have just added 4 console.log(...) into the funcition startProcenter(..) as below.
function startPresenter(sessionId, ws, sdpOffer, callback) { clearCandidatesQueue(sessionId);
if (presenter !== null) {
stop(sessionId);
return callback("Another user is currently acting as presenter. Try again later ...");
}
console.log(" "); console.log("==== Has entered startPresenter: " + sessionId + " ===="); console.log(" ")
presenter = {
id : sessionId,
pipeline : null,
webRtcEndpoint : null
}
getKurentoClient(function(error, kurentoClient) {
if (error) {
stop(sessionId);
return callback(error);
}
if (presenter === null) {
stop(sessionId);
return callback(noPresenterMessage);
}
kurentoClient.create('MediaPipeline', function(error, pipeline) {
if (error) {
stop(sessionId);
return callback(error);
}
if (presenter === null) {
stop(sessionId);
return callback(noPresenterMessage);
}
console.log(" ");
console.log("==== MediaPipeline is created successully: " + pipeline + " ====");
console.log(" ")
presenter.pipeline = pipeline;
pipeline.create('WebRtcEndpoint', function(error, webRtcEndpoint) {
if (error) {
stop(sessionId);
return callback(error);
}
if (presenter === null) {
stop(sessionId);
return callback(noPresenterMessage);
}
console.log(" ");
console.log("==== webRtcEndpoint is created successully: " + webRtcEndpoint + " ====");
console.log(" ")
presenter.webRtcEndpoint = webRtcEndpoint;
if (candidatesQueue[sessionId]) {
while(candidatesQueue[sessionId].length) {
var candidate = candidatesQueue[sessionId].shift();
webRtcEndpoint.addIceCandidate(candidate);
}
}
webRtcEndpoint.on('OnIceCandidate', function(event) {
var candidate = kurento.getComplexType('IceCandidate')(event.candidate);
ws.send(JSON.stringify({
id : 'iceCandidate',
candidate : candidate
}));
});
webRtcEndpoint.processOffer(sdpOffer, function(error, sdpAnswer) {
if (error) {
stop(sessionId);
return callback(error);
}
if (presenter === null) {
stop(sessionId);
return callback(noPresenterMessage);
}
callback(null, sdpAnswer);
});
console.log(" ");
console.log("==== processOffer is passed through successully. ====");
console.log(" ")
webRtcEndpoint.gatherCandidates(function(error) {
if (error) {
stop(sessionId);
return callback(error);
}
});
});
});
});
}
It will be helpful if you could add the similar log-printers at your side and show us your log then.
Thanks,
Jim
I'll need to check again, but I recalled webRtcEndpoint.processOffer
failed to run. The server logs showed ICE negotiation was happening, but as to how far it got in the process, I'm not sure.
I recall a long time ago using node-fetc / axios, trying to use node's own HTTP API and this resulted slightly random results (however I think this was back in node 4.x days)
Yeah, please send us your debug log of app server, which "kurento-tutorial-node/kurento-one2many-call" results in. It sounds that you ran a debug-build of kms also, and found error behavior on the Kurento media server side regarding to the issue. So, could you also send us the related debug log that media server results in?
Again with your description:" Fails to execute processOffer", let's look into function "kms_sdp_session_process_offer" at kms-omni-build/kms-core/src/gst-plugins/commons/kmssdpsession.c.
If you build/run kms as the steps as https://github.com/Kurento/doc-kurento/blob/master/static/kms_development_guide.md#launch-kms, then there should be 2 lines of logouts: "Process SDP Offer:", "Generated SDP Answer:".
it means that the WebRtc peer(EndPoint) at media server side, invoked by startPresenter at server.js, has received "offer" from appServer, the running program "kurento-one2many-call" that is a Node.js app.
According to your description, you thought that the media server was blocked at processOffer. So, could you confirm that the function "kms_sdp_session_process_offer" was called but didn't go to the end? Then we could move the debugging into next step.
Thanks,
Hi, sorry it's been a while, I've not been able to revisit looking at the Kurento stuff I was doing for some months due to time constraints. I did manage to skirt around the issue in the end (to the best of my knowledge) by using the node http libraries directly, via callbacks. I wonder if there is something odd with Promises causing issues here as a result given the callback based approach worked for me, and I happen to notice these issues
https://github.com/Kurento/bugtracker/issues/133 https://github.com/Kurento/bugtracker/issues/181 https://github.com/Kurento/bugtracker/issues/71
I'll try and have a stab later to see if "kurento-client": "https://github.com/Kurento/kurento-client-js#master"
works as suggested on 16 Feb last year by tgabi333 (didn't see that before).
If that does fix it can someone else push new versions to NPM as it'll be massively helpful for others. It looks like there's been a bit of activity in the various Kurento repos recently, so is this a sign that we'll have people maintaining stuff? - just as lack of community makes me nervous and end up looking at things like MediaSoup!... something I'd rather avoid ;-)
Hi @virtualfunction,
KurentoClient JS is not very well maintained by our small team. We are focused on Kurento Media Server and KurentoUtils.js.
But if there is a PR that fixes some bug or issue, we will accept it without problem.