Connection to backend lost after network interruption when using websockets
Description of the bug
I used the vaadin gradle starter skeleton to have a minimal reproducing example and changed AppShell.java to add a push annotation
/**
* Use the @PWA annotation make the application installable on phones, tablets
* and some desktop browsers.
*/
@PWA(name = "Project Base for Vaadin", shortName = "Project Base")
@Theme("my-theme")
@Push(transport = Transport.WEBSOCKET, value = PushMode.AUTOMATIC)
public class AppShell implements AppShellConfigurator {
}
After starting using jettyRun. I open the app in a browser in a virtual machine (I used VirtualBox and libvirt).
I type in something and verify I get a result. I change the textbox content.
Then I virtually pull the cable.
In libvirt this can be done as follows, start virsh. Find out machine name (domain <dom>) using list, find out interface (domiflist <dom>. 'Pull cable' using domif-setlink <dom> <iface> down.
It is not sufficient to use the developer tools. That does not interrupt the websocket connection.
Press 'Say hello' again to send something and wait some time (best reproduced when browser print something like 'Websocket closed, reason: Connection was closed abnormally (that is, with no close frame being sent). - wasClean: false')
Then reconnect the network (domif-setlink <dom> <iface> up).
And observe that nothing is send any more.
Expected behavior
The connection is reestablished and interaction is possible again.
I do not know if the last update should still be pushed or if resynchronization is better.
Minimal reproducible example
I downloaded https://github.com/vaadin/base-starter-gradle and applied the change above. I copied the code when vaadin version was 2.4.10
Versions
- Vaadin / Flow version: 24.4.10 (probably also most current)
- Java version: 21
- OS version: Client Windows 10, Server Linux Ubuntu Jammy
- Browser version: reproduced with Tatu Lund on Microsoft Edge Version 129.0.2792.79, Firefox 131
- Application Server: jetty as of starter
By default, Atmosphere does not call onOpen or onReopen when the web-socket connection is re-established.
As a result, if also fallback transport is WEBSOCKET, the application will not recover from the disconnected state.
If Atmosphere fallbacks to LONG_POLLING (default), then onReopen is called and the app resumes to connected state.
However, in both cases, the RequestResponseTracker.hasActiveRequest() is never reset to false, thus preventing new UIDL messages from being sent to the server.
Just reset RequestResponseTracker.hasActiveRequest flag is not enough for web socket transport.
The problem is that the message that is not delivered because of network loss is completely lost, and after reconnection a resynchronization is triggered, causing a page reload. This seems restricted to web socket transport because with WEBSOCKET_XHR or LONG_POLLING, the payload of the undelivered message is sent as body of the reconnection request.
Another interesting thing is that, by default, Atmosphere will downgrade to the fallback transport if the first web socket reconnection attempt fails. Luckily, there's an Atmosphere configuration property (maxWebsocketErrorRetries, default 1) that can be tuned to try to reconnect with web socket for several times, before downgrading.
Most likely, Flow should set a sensible value for it, for example 12, given that reconnection attempts happen every 5 seconds.
In addition, atmosphere-javascript has a couple of unreleased fixes related to web socket reconnection.
So a potential fixes in Flow client could be:
MessageSendersend: for web socket, internally store a copy of the message that will be sent to the server.setClientToServerMessageId: called after the server response. When invoked, remove the stored the message if the server has seen it.doSendInvocationsToServer: if there's a stored message, resend it, and postpone processing the rest of the queued messages.
DefaultConnectionStateHandlerpushOk: on reconnect, reset theRequestResponseTrackeractive request flag and only for web socket immediately send pending message (for other transports, the pending message is sent as part of the reconnection request).
The above listed fixes look slightly related to a bigger mechanism described in https://github.com/vaadin/flow/issues/20348.
This ticket/PR has been released with Vaadin 24.5.4.
This ticket/PR has been released with Vaadin 24.4.17.
I can confirm that the issue fixed. Thank you very much