Editor reports connection issues and blocks editing
Describe the bug After being in a editing session for some time, the editor reports "The document could not be loaded. Please check your internet connection."
Looking at the network requests, all requests went through (no 4xx or 5xx) and no error log in the console either.
It seems though like hasConnectionError in Editor.vue gets set to true somehow in onChange(), triggered by _fetchSteps() from the PollingBackend.
Screenshots
Server details:
- Nextcloud version: 33 dev and 32.0.1-rc2 at least
I had seen it once but unable to trigger it again. For now I have a few windows open with a conditional breakpoint here:
I was also able to catch the error by adding a watcher on hasConnectionIssue in Editor.vue. The change on hasConnectionIssue seems to be triggered by this line: https://github.com/nextcloud/text/blob/177bbb9aef2df0277bac4aedaf680429e2753b80/src/components/Editor.vue#L531
After adding a conditional breakpoint just like Julius did above, it looks like this:
and
I wonder whether the debugger runs wild because it resolves document here as the global Window document property which is also accessible directly via document.
Maybe we indeed should rename local variables document if they're not instance variables (i.e. this.document).
Looks like a debugger / source map issue yes.
Maybe also interesting in the trace where close is called for me:
This seems to be correlated with POST errors upon opening the file happening every few seconds.
Just now my breakpoint here stopped before hasConnectionIssues was altered. And indeed the last sync requests before happened only every 30 seconds. So probably they got throttled after no changes happened for some time.
Since messageReconnectTimeout in y-websocket.js is 30 seconds as well, I suspect a timing issue. When the sync requests happen only every 30 seconds and one of them takes a bit longer, the linked code path is touched and closes the "websocket connection" (which is no real websocket connection in our case).
It's easy to reproduce by setting messageReconnectTimeout in y-websocket.js to 4000.
What comes to my mind:
- The websocket will close if it does not receive awareness updates in 30 seconds. y-websocket sends awareness updates itself every 15 seconds. So not seeing any for 30 makes it believe the connection is broken.
- #7702 Maybe this is a side effect? Before we did not respond with steps to pushes that only contained an awareness message. Now we send the steps that arrived in the meantime. This means that the steps will not be included in the next sync response. So the polling backend is more likely to not receive any steps and think there is no activity going on.
- Afaik we also do not send the latest awareness messages in a push response but only in a sync response. Otherwise we'd at least see our own awareness message bounce back like updates do.
@max-nextcloud I guess the problem is that FETCH_INTERVAL_INVISIBLE in PollingBackend and messageReconnectTimeout in y-websocket are both 30 seconds. So probably we should either lower the former or raise the latter by five seconds, what do you think?
what do you think?
@mejo- sounds good. I'd shorten the FETCH_INTERVAL_INVISIBLE - but both seem fine.
I opened a PR with minimal changes, both lowering FETCH_INTERVAL_INVISIBLE and increasing messageReconnectTimeout to harden it until a better solution is implemented. https://github.com/nextcloud/text/pull/7822
@benjaminfrueh and I were discussing to have a further improvement to automatically adapt the check interval with the one that is used to fetch to avoid browser throttling of background tabs to cause further issues.
Can you detect / do you know the effect of browser throttling of background tabs? My understanding is that we intend to have a smaller interval (20sec) but are observing 1 minute. Do you know where this is coming from?
I have the same message in app/files//files after around 40 seconds without even editing a file, could it be related?
@baby-gnu Do you happen to have a folder description or a Readme.md in that folder?
Does it also show in folders that do not have that?
@max-nextcloud yes there is a Readme.md inside.
Meanwhile, I fixed an issue with time synchronization, the server was 10 minutes behind and things seems to get much better, I did not trigger the error message in app/files/files for minutes, neither in editing a file.
@baby-gnu ohhh... that's an interesting issue you are bringing up. I don't think we account for out of sync server time.
https://github.com/nextcloud/text/pull/8005
I no longer experience these issues after #8005 got released. I'll close the issue for now, but feel free to reopen it if you think we further need to track this @max-nextcloud @benjaminfrueh.