CometD: stalled subscription notifications
Testing CometD subcriptions to server status and player status notifications using libcometd.js from the material-skin plugin.
When this code path is followed on the server long-polled notifications are stalled until the next long polling timeout (60s) occurs. This is due to the remove_connection call in sendHTTPResponse.
Reproducing the problem is a bit tricky. I've managed to do it reasonably reliably using the attached test web page as follows:
- copy the attached files to a
test-cometdfolder underHTML/Default/htmlfolder on the server - run the page in a browser with a URL like:
http://localhost:9000/html/test-cometd/ - click on
Connect(the first player registered on the server will be selected) and thenSubscribe Player - power on/off the selected player to generate server and player status notifications which should show up in the page's
Logwidget - refresh the page in the browser and repeat the previous two steps. On the 2nd or 3rd attempt (page refresh + previous 2 steps) you should see that the status notifications are stalled until the long-polling timeout occurs.
NB the page refreshing is crucial to getting transport('long-polling') to be set on the $httpClient used to service the /slim/subscribe player status call.
From what I can see, the problem could be fixed in a few different ways:
- suppressing the
remove_connectioncall in this exact case - suppressing the
remove_connectioncall more generally - eg only calling it in the/meta/(re)connectcase? - removing
$httpClient->transport('long-polling')from the client once the response has been sent - others???
The problem was reproduced with LMS running on Windows 10 (version 8.1) and Linux (version 7.9.1).
Is this something @CDrummond has seen, too? Are you able to reproduce this with any browser?
Can't say I have noticed this issue, and I do refresh the page quite often to ensure the skin reloads. I clear cache and do a full refresh/reload, so perhaps this is why I do not see it?
I have noticed, but probably not related to this issue, that notifications can be 'slow' (i.e. take a few seconds to show), which is why Material manually requests a status update after most actions.
Reproduced with Chrome + Firefox on Windows and Chrome on Android. I'm pretty sure it's not a client issue - the page refreshing just seems to be a reasonably reliable way to get the $delayedResponse = 0 code path to be followed on the server.
What I think is happening is that the $httpClient instances persist across requests and each one is associated with a particular socket. (I could be way off here - new to Perl.) When the page is opened we get, say, socket 1 used for /meta/connect long polling requests and socket 2 used for other requests (/slim/subscribe). Refreshing the page and connecting again can lead to socket 2 being used for /meta/connect long polling requests and socket 1 being used for /slim/subscribe requests.
So, the $httpClient that was initially being used for the long-polling requests (and so has transport('long-polling') set) is now being used for the /slim/subscribe requests. And that's how we get to execute the $delayedResponse = 0 code path. (There could be other ways to get there too.)
Thanks for your test files, @timbell. I believe I have been able to reproduce the issue: sometimes after reloading a web UI using long-polling, it would take a time out on the long-poll before the connection would re-establish. Therefore after a page reload we can have up to 60 seconds before the notifications are reaching the web UI. Is this correct? Are there any other limitations I haven't been able to experience?
Yes, @michaelherger, that's right. (There may be other ways that can cause the $delayedResponse = 0 code path in Slim::Web::Cometd::handler to be followed, however.)
A simple fix might be to clear the transport in Slim::Web::Cometd::handler like this:
# reset transport (especially for the long-polling case)
$conn->[HTTP_CLIENT]->transport( ' ' );
my $clid;
my $events = [];
...