slimserver icon indicating copy to clipboard operation
slimserver copied to clipboard

CometD: stalled subscription notifications

Open timbell opened this issue 5 years ago • 5 comments

Testing CometD subcriptions to server status and player status notifications using libcometd.js from the material-skin plugin.

When this code path is followed on the server long-polled notifications are stalled until the next long polling timeout (60s) occurs. This is due to the remove_connection call in sendHTTPResponse.

Reproducing the problem is a bit tricky. I've managed to do it reasonably reliably using the attached test web page as follows:

  • copy the attached files to a test-cometd folder under HTML/Default/html folder on the server
  • run the page in a browser with a URL like: http://localhost:9000/html/test-cometd/
  • click on Connect (the first player registered on the server will be selected) and then Subscribe Player
  • power on/off the selected player to generate server and player status notifications which should show up in the page's Log widget
  • refresh the page in the browser and repeat the previous two steps. On the 2nd or 3rd attempt (page refresh + previous 2 steps) you should see that the status notifications are stalled until the long-polling timeout occurs.

NB the page refreshing is crucial to getting transport('long-polling') to be set on the $httpClient used to service the /slim/subscribe player status call.

From what I can see, the problem could be fixed in a few different ways:

  • suppressing the remove_connection call in this exact case
  • suppressing the remove_connection call more generally - eg only calling it in the /meta/(re)connect case?
  • removing $httpClient->transport('long-polling') from the client once the response has been sent
  • others???

The problem was reproduced with LMS running on Windows 10 (version 8.1) and Linux (version 7.9.1).

timbell avatar Feb 05 '21 18:02 timbell

Is this something @CDrummond has seen, too? Are you able to reproduce this with any browser?

michaelherger avatar Feb 07 '21 05:02 michaelherger

Can't say I have noticed this issue, and I do refresh the page quite often to ensure the skin reloads. I clear cache and do a full refresh/reload, so perhaps this is why I do not see it?

I have noticed, but probably not related to this issue, that notifications can be 'slow' (i.e. take a few seconds to show), which is why Material manually requests a status update after most actions.

CDrummond avatar Feb 07 '21 07:02 CDrummond

Reproduced with Chrome + Firefox on Windows and Chrome on Android. I'm pretty sure it's not a client issue - the page refreshing just seems to be a reasonably reliable way to get the $delayedResponse = 0 code path to be followed on the server.

What I think is happening is that the $httpClient instances persist across requests and each one is associated with a particular socket. (I could be way off here - new to Perl.) When the page is opened we get, say, socket 1 used for /meta/connect long polling requests and socket 2 used for other requests (/slim/subscribe). Refreshing the page and connecting again can lead to socket 2 being used for /meta/connect long polling requests and socket 1 being used for /slim/subscribe requests.

So, the $httpClient that was initially being used for the long-polling requests (and so has transport('long-polling') set) is now being used for the /slim/subscribe requests. And that's how we get to execute the $delayedResponse = 0 code path. (There could be other ways to get there too.)

timbell avatar Feb 07 '21 10:02 timbell

Thanks for your test files, @timbell. I believe I have been able to reproduce the issue: sometimes after reloading a web UI using long-polling, it would take a time out on the long-poll before the connection would re-establish. Therefore after a page reload we can have up to 60 seconds before the notifications are reaching the web UI. Is this correct? Are there any other limitations I haven't been able to experience?

michaelherger avatar Feb 07 '21 18:02 michaelherger

Yes, @michaelherger, that's right. (There may be other ways that can cause the $delayedResponse = 0 code path in Slim::Web::Cometd::handler to be followed, however.)

A simple fix might be to clear the transport in Slim::Web::Cometd::handler like this:

    # reset transport (especially for the long-polling case)
    $conn->[HTTP_CLIENT]->transport(  ' '  );
    
    my $clid;
    my $events = [];
    ...

timbell avatar Feb 07 '21 19:02 timbell