flow icon indicating copy to clipboard operation
flow copied to clipboard

Resynchronizing UI by client's request

Open jonasrotilli opened this issue 2 years ago • 16 comments

Description of the bug

I am often getting the message in the log of Resynchronizing UI by client's request.

The full message is: Resynchronizing UI by client's request. The network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout

It's not connection, apparently it's session related.

I have on the same linux server running another Vaadin 8 application. They have different names, different ports, in different folders and are NGINX-mapped with different subdomains.

I have already evaluated the other problems related to the topic: #12640 There was no clear solution, someone raised some possibilities:

  • Browser, it's not in my case, it wouldn't happen so often.
  • HTTP proxy, I use NGINX for the subdomains, but I use other services like NODE, static HTML and never had a problem. NGINX was configured by default, without any additional configuration, it just throws the subdomain to port X.
  • Possibility of mixing sessions: it makes no sense, since I have very little load, it happens even with only 1 user logged in.

#12173 In this problem the user uses long duration push. Not my case, I use simple, default @Push.

#11645 In this problem as I understand it was the slow connection. It's not my case, everything is flying here.

#12173 In this problem the user uses long duration push. Not my case, I use simple, default @Push.

#10096 This is a very similar scenario. But there was no conclusion, the user closed without informing how the issue was resolved and if it was resolved.

#9399 This one he solved by changing the server, it is difficult to assess what the problem was

Anyway, this problem is quite recurrent and should be better explained in the documentation. I downloaded the example available from the site, nothing out of the ordinary, little or no extra configuration.

Expected behavior

The expectation is that it doesn't lock the user's screen, it's terrible to have to ask him to refresh the page, because after it's broken it doesn't come back.

Minimal reproducible example

It's hard to simulate, because it doesn't always happen. The impression is that it happens after a while without changes on the page, but sometimes it happens right after logging in or during some slower operation.

Versions

  • Vaadin / Flow version: 23.2.0.alpha1

  • Java version: openjdk version "11.0.3" 2019-04-16 OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.10.1) OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.10.1, mixed mode, sharing)

  • OS version: -Ubuntu

  • Browser version (if applicable): Chrome

jonasrotilli avatar Jul 28 '22 19:07 jonasrotilli

Difficult. I increased the timeout on NGINX to 600 seconds and the problem continues.

jonasrotilli avatar Jul 29 '22 12:07 jonasrotilli

Same on Vaadin version 14.8.14

WARN com.vaadin.flow.server.communication.ServerRpcHandler [http-nio-8080-exec-3] Resynchronizing UI by client's request. Under normal operations this should not happen and may indicate a bug in Vaadin platform. If you see this message regularly please open a bug report at https://github.com/vaadin/flow/issues

tiagomartins91 avatar Jul 29 '22 13:07 tiagomartins91

tiagomartins91

Apparently no one from Vaadin is watching here.. Let's try to find the problem ourselves, try to see what we have in common. 1 - Do you have any other Vaadin application running on the same server? A: I do, but it's another folder, another version, nothing shared.

2 - Does it happen in development, when running with Intelij or similar? A: No.

3 - Does it happen in production? A: Yes, I stop the service and put a new version, first login most of the time happens, which disproves the theory that it's because of time without moving.

jonasrotilli avatar Jul 29 '22 18:07 jonasrotilli

I was hoping to be a conflict between two applications. But it is not. I completely stopped the other application and started the new one, Vaadin 23.

And the same problem happened: Resynchronizing UI by client's request. The network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout

This is very bad, it is a very serious problem.

jonasrotilli avatar Jul 30 '22 12:07 jonasrotilli

I would suggest to post your nginx and push configuration.

knoobie avatar Jul 30 '22 13:07 knoobie

Sugiro postar sua configuração nginx e push.

Push detault:

@Theme(value = "myapp")
@PWA(name = "upCampo", shortName = "upCampo")
@NpmPackage(value = "line-awesome", version = "1.3.0")
@Push

NGINX file:

server {

    server_name  novoportal.MYWEBSITE;

    location / {
        proxy_pass  http://127.0.0.1:1628;
    }

    listen [::]:443 ssl; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/novoportal.MYWEBSITE/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/novoportal.MYWEBSITE/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot


}
server {
    if ($host = novoportal.MYWEBSITE) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


    listen 80;
    listen [::]:80;

    server_name  novoportal.MYWEBSITE;
    return 404; # managed by Certbot

   proxy_read_timeout 600;
   proxy_connect_timeout 600;
   proxy_send_timeout 600;


}

About the timeouts in NGINX, I added more time, it didn't make any difference with or without this part of the code:


   proxy_read_timeout 600;
   proxy_connect_timeout 600;
   proxy_send_timeout 600;

jonasrotilli avatar Jul 30 '22 13:07 jonasrotilli

I don't see anything related to push in the configuration. Cuba has a example for nginx that you could try: https://doc.cuba-platform.com/manual-latest/server_push_settings.html#server_push_settings_using_proxy - important is the part about upgrade

I'm more experienced with apache httpd and there it is a must have to configure websockets corretly to work in corporate networks.

knoobie avatar Jul 30 '22 13:07 knoobie

Não vejo nada relacionado ao push na configuração. Cuba tem um exemplo para nginx que você pode tentar: https://doc.cuba-platform.com/manual-latest/server_push_settings.html#server_push_settings_using_proxy - importante é a parte sobreupgrade

Eu sou mais experiente com apache httpd e aí é necessário configurar websockets corretamente para trabalhar em redes corporativas.

I added your suggestion in NGINX. Good news: so far, the problem hasn't happened yet. I'll leave it running during the day and come back here to confirm if it worked or not.

Thank you very much!

jonasrotilli avatar Jul 30 '22 14:07 jonasrotilli

I don't see anything related to push in the configuration. Cuba has a example for nginx that you could try: https://doc.cuba-platform.com/manual-latest/server_push_settings.html#server_push_settings_using_proxy - important is the part about upgrade

I'm more experienced with apache httpd and there it is a must have to configure websockets corretly to work in corporate networks.

Sorted out! Big help. I accessed the CUBA link and used the "location" part, it seems that there is something related to WebSocket support. Since putting it on, I haven't had any more problems. @knoobie Thank you so much for your help, it saved me from several nights sleep!

I share here my NGINX that I'm using for other users, and I won't close the ticket so that someone from Vaadin can evaluate if any improvement is needed in relation to the theme. I believe this could be in the documentation.

Here's the complete file:

server {

server_name  mywebsite.com;

 location / {
     proxy_set_header X-Forwarded-Host $host;
     proxy_set_header X-Forwarded-Server $host;
     proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
     proxy_read_timeout     3600;
     proxy_connect_timeout  240;
     proxy_set_header Host $host;
     proxy_set_header X-RealIP $remote_addr;

     proxy_pass  http://127.0.0.1:PORT_EXIT_DO_YOUR_SPRINGBOOT;

     proxy_set_header X-Forwarded-Proto $scheme;

     proxy_set_header Upgrade $http_upgrade;
     proxy_set_header Connection "upgrade";
}

 listen [::]:443 ssl; # managed by Certbot (Certificate SSL)
 listen 443 ssl; # managed by Certbot (Certificate SSL)
 ssl_certificate /etc/letsencrypt/live/mywebsite.com/fullchain.pem; # managed by Certbot (Certificate SSL)
 ssl_certificate_key /etc/letsencrypt/live/mywebsite.com/privkey.pem; # managed by Certbot (Certificate SSL)
 include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot (Certificate SSL)
 ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot (Certificate SSL)


}
server {
 if ($host = mywebsite.com) {
     return 301 https://$host$request_uri;
 } # managed by Certbot


 listen 80;
 listen [::]:80;

 server_name  mywebsite.com;
 return 404; # managed by Certbot (Certificate SSL)

proxy_read_timeout 600;
proxy_connect_timeout 600;
proxy_send_timeout 600;

}

jonasrotilli avatar Aug 01 '22 12:08 jonasrotilli

I believe this could be in the documentation.

How to configure a reverse proxy should be an important topic inside the docs (cc: @tarekoraby)

knoobie avatar Aug 01 '22 12:08 knoobie

I have bad news. Monitoring since that day and it happened again. Less often than before the NGINX change but it is happening. Any other possible problems?

jonasrotilli avatar Aug 08 '22 17:08 jonasrotilli

Same here

tiagomartins91 avatar Aug 09 '22 08:08 tiagomartins91

That is terrible. We never had problems like this in the old Vaadin 8 application. Imagine the inconvenience this is causing the customer. This image makes me shiver every time I see it:

Captura de Tela 2022-08-09 às 09 29 42

Please someone help us!

Remove the documentation topic, not only that, it's something very serious that needs to be investigated.

jonasrotilli avatar Aug 09 '22 13:08 jonasrotilli

I would personally try to obtain a debug log of nginx to try to understand what's happening.

tarekoraby avatar Aug 09 '22 18:08 tarekoraby

Eu pessoalmente tentaria obter um log de depuração do nginx para tentar entender o que está acontecendo.

I am fully available to collect the data needed to resolve this issue.

Help me, how do I do this?

jonasrotilli avatar Aug 09 '22 18:08 jonasrotilli

I'm not an nginx expert, but I'd check the docs for instructions on that: https://nginx.org/en/docs/debugging_log.html.

tarekoraby avatar Aug 09 '22 19:08 tarekoraby

Any update? Still happens sometimes and needs to refresh the page for the application work

tiagomartins91 avatar Aug 21 '22 22:08 tiagomartins91

We are going to investigate it more closely in the upcoming development iteration.

mshabarov avatar Aug 23 '22 06:08 mshabarov

I've managed to reproduce this issue almost consistently. There are a couple of interesting factors that lead into this. I have a pcap and a google chrome debug output and the code that generates the issue.

java.lang.UnsupportedOperationException: Unexpected message id from the client. Expected sync id: 9, got 10. more details logged on DEBUG level.

On the following above, I noticed that in both pcap and and the Google Developer network tab that only a sync for 8 and 10 were generated from the server.

In the PCAP I can see that the Websocket port actually generates a FIN packet in between of both the id 8 packet being generated and id 10. Further data however is still being sent on the socket, which is technically fine. I am unclear why the server thinks that Id 9 got generated. But in the specific instance I looked up the FIN was generated and might be related. As a general note, in reading about this issue in other places, it mentions network quality, and I agree to this fact. We're having latency around 300ms to the server and client and slightly high packet loss ~30%. As this is TCP however, it really shouldn't be impacting the order and number of packets being finally received by server and client as re-transmissions should end up succeeding.

For most of my code I am already using my UI code as ui.access(command). I was also using @Push(PushMode.AUTOMATIC). I am unclear if this is related.

I moved to using @Push(PushMode.MANUAL) with ui.push after the ui.access and I have not been able to reproduce this problem straight after.

I have a PCAP for this I would prefer to hand it over to someone at Vaadin directly as there is likely confidential data within the data.

My suspicion therefore is that the Push automated sync messages logic has some server side bug.

This was tested on version 22.0.2

kagian avatar Aug 23 '22 07:08 kagian

@kagian Please share it, if you wanna see steady progress in this topic.

knoobie avatar Aug 23 '22 07:08 knoobie

Our multi-years Vaadin 7->14 migration work just went in production.

During dev/tests period, we had occasionnally this problem, principally when remote working, with a bad network. But now we have more users, and users with various network configuration, we realize that the problem is more serious than expected. We speak a lot of NGinx in this thread, my feeling is that this problem is not reverse-proxy related. For sure we have an NGinx in front of the app, but we also access directly the tomcat port, and we can see the problem at the same frequency. My short term goal is to produce a small project that reproduces the problem, for the Vaadin team. I'll probably use one of those browser extensions that simulates a bad network.

This problem is very critical, and as someone said, it wasn't observable with the old 7/8 vaadin platform

flefebure avatar Sep 08 '22 23:09 flefebure

We speak a lot of NGinx in this thread, my feeling is that this problem is not reverse-proxy related.

@flefebure, @kagian, @jonasrotilli The problem can be reverse-proxy related, but there are number of other possible causes as well. E.g. slow VPN, flaky Wifi / cellular network etc.

Also framework corner case bugs are a possibility, it is not long ago we introduced this fix, so I recommend to use the latest Vaadin 14 or 23 version, and observe if is more stable in your environment.

https://github.com/vaadin/flow/pull/13733

TatuLund avatar Sep 09 '22 07:09 TatuLund

Just to clarify here, the protocol in use seemed to be TCP as far as I could see. There is also no proxy or additional rewrite components in the path. So for an end to end TCP session, there really should not be the possibility of loss without recovery. And as such it should not be possible to get an out of sequence or dropped packets (unless this really is in UDP, which didn't seem like it).

It really does look like some kind of state tracking issue in Vaading server side. I'll check in on the new version, however, the current change to moving over to manual push updates seems to work well and as a result has reduced the want to change this again on our side.

kagian avatar Sep 09 '22 08:09 kagian

@tatulund we just upgraded Vaadin 14.8.4->14.8.17 et Flow 2.7.11->2.7.20 Now we wait for users feedback [cross fingers]

flefebure avatar Sep 09 '22 13:09 flefebure

To complete: It decreased a lot after I made the suggestions for changes in NGINX that I posted there at the beginning. But it didn't completely solve it. It started to happen more often. No changes to the version of either Vaadin or any changes to the server in question. I believe this needs further investigation.

jonasrotilli avatar Sep 14 '22 12:09 jonasrotilli

Any update about this?

It happens more often with the last release. It's impossible to deliver or update a vaadin application in production with this issue.

I have another application running vaadin 8 in production and this doesn't happen. Use the same NGINX reverse proxy and the same configuration, without any problem.

tiagomartins91 avatar Sep 30 '22 11:09 tiagomartins91

I have the same issue and my customers are complaining. This occurs since I've upgraded from Vaadin 8 to Vaadin 23 (never happened on 8). I'm now running V23.2.2 and the problem is still there. The app is running on Tomcat 9 in Azure with an Azure AppGateway acting as a proxy. I see the log [http-nio-8080-exec-4] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout I've just increased the request-time-out on the Proxy to see if it changes something. I've also seen cases where, on the client side, I have the loading indicators that keeps resetting as if the page was beeing reloaded but, on the server side, I do not see any incoming request ... so it's as if the client side was looping on itself ...

echarlus avatar Oct 03 '22 13:10 echarlus

In my case, just stop the refresh lopping page if I force it to refresh.

tiagomartins91 avatar Oct 03 '22 15:10 tiagomartins91

In my case, just stop the refresh lopping page if I force it to refresh.

yes this works most of the time. But asking a customer to do that is not an option ... I also encountered cases where I had to clear all the stored data (session etc) and restart navigation before I could reach the site again :(

echarlus avatar Oct 03 '22 19:10 echarlus

Any update about this?

It happens more often with the last release. It's impossible to deliver or update a vaadin application in production with this issue.

I have another application running vaadin 8 in production and this doesn't happen. Use the same NGINX reverse proxy and the same configuration, without any problem.

Agreed I never experienced this issue with Vaadin 8, with 23 it's happening very often and my customers are becoming angry. I hope the fix will be delivered quickly

echarlus avatar Oct 03 '22 20:10 echarlus