slimserver CometD: stalled subscription notifications

Testing CometD subcriptions to server status and player status notifications using libcometd.js from the material-skin plugin.

When this code path is followed on the server long-polled notifications are stalled until the next long polling timeout (60s) occurs. This is due to the remove_connection call in sendHTTPResponse.

Reproducing the problem is a bit tricky. I've managed to do it reasonably reliably using the attached test web page as follows:

copy the attached files to a test-cometd folder under HTML/Default/html folder on the server
run the page in a browser with a URL like: http://localhost:9000/html/test-cometd/
click on Connect (the first player registered on the server will be selected) and then Subscribe Player
power on/off the selected player to generate server and player status notifications which should show up in the page's Log widget
refresh the page in the browser and repeat the previous two steps. On the 2nd or 3rd attempt (page refresh + previous 2 steps) you should see that the status notifications are stalled until the long-polling timeout occurs.

NB the page refreshing is crucial to getting transport('long-polling') to be set on the $httpClient used to service the /slim/subscribe player status call.

From what I can see, the problem could be fixed in a few different ways:

suppressing the remove_connection call in this exact case
suppressing the remove_connection call more generally - eg only calling it in the /meta/(re)connect case?
removing $httpClient->transport('long-polling') from the client once the response has been sent
others???

The problem was reproduced with LMS running on Windows 10 (version 8.1) and Linux (version 7.9.1).

Feb 05 '21 18:02 timbell

Is this something @CDrummond has seen, too? Are you able to reproduce this with any browser?

Feb 07 '21 05:02 michaelherger

Can't say I have noticed this issue, and I do refresh the page quite often to ensure the skin reloads. I clear cache and do a full refresh/reload, so perhaps this is why I do not see it?

I have noticed, but probably not related to this issue, that notifications can be 'slow' (i.e. take a few seconds to show), which is why Material manually requests a status update after most actions.

Feb 07 '21 07:02 CDrummond

Reproduced with Chrome + Firefox on Windows and Chrome on Android. I'm pretty sure it's not a client issue - the page refreshing just seems to be a reasonably reliable way to get the $delayedResponse = 0 code path to be followed on the server.

What I think is happening is that the $httpClient instances persist across requests and each one is associated with a particular socket. (I could be way off here - new to Perl.) When the page is opened we get, say, socket 1 used for /meta/connect long polling requests and socket 2 used for other requests (/slim/subscribe). Refreshing the page and connecting again can lead to socket 2 being used for /meta/connect long polling requests and socket 1 being used for /slim/subscribe requests.

So, the $httpClient that was initially being used for the long-polling requests (and so has transport('long-polling') set) is now being used for the /slim/subscribe requests. And that's how we get to execute the $delayedResponse = 0 code path. (There could be other ways to get there too.)

Feb 07 '21 10:02 timbell

Thanks for your test files, @timbell. I believe I have been able to reproduce the issue: sometimes after reloading a web UI using long-polling, it would take a time out on the long-poll before the connection would re-establish. Therefore after a page reload we can have up to 60 seconds before the notifications are reaching the web UI. Is this correct? Are there any other limitations I haven't been able to experience?

Feb 07 '21 18:02 michaelherger

Yes, @michaelherger, that's right. (There may be other ways that can cause the $delayedResponse = 0 code path in Slim::Web::Cometd::handler to be followed, however.)

A simple fix might be to clear the transport in Slim::Web::Cometd::handler like this:

    # reset transport (especially for the long-polling case)
    $conn->[HTTP_CLIENT]->transport(  ' '  );
    
    my $clid;
    my $events = [];
    ...

Feb 07 '21 19:02 timbell