text icon indicating copy to clipboard operation
text copied to clipboard

Editor reports connection issues and blocks editing

Open mejo- opened this issue 2 months ago • 18 comments

Describe the bug After being in a editing session for some time, the editor reports "The document could not be loaded. Please check your internet connection."

Looking at the network requests, all requests went through (no 4xx or 5xx) and no error log in the console either.

It seems though like hasConnectionError in Editor.vue gets set to true somehow in onChange(), triggered by _fetchSteps() from the PollingBackend.

Screenshots

Image

Server details:

  • Nextcloud version: 33 dev and 32.0.1-rc2 at least

mejo- avatar Oct 23 '25 08:10 mejo-

I had seen it once but unable to trigger it again. For now I have a few windows open with a conditional breakpoint here:

Image

juliusknorr avatar Oct 23 '25 13:10 juliusknorr

Image

juliusknorr avatar Oct 23 '25 13:10 juliusknorr

I was also able to catch the error by adding a watcher on hasConnectionIssue in Editor.vue. The change on hasConnectionIssue seems to be triggered by this line: https://github.com/nextcloud/text/blob/177bbb9aef2df0277bac4aedaf680429e2753b80/src/components/Editor.vue#L531

After adding a conditional breakpoint just like Julius did above, it looks like this:

Image

and

Image

I wonder whether the debugger runs wild because it resolves document here as the global Window document property which is also accessible directly via document.

Maybe we indeed should rename local variables document if they're not instance variables (i.e. this.document).

mejo- avatar Oct 23 '25 13:10 mejo-

Looks like a debugger / source map issue yes.

Maybe also interesting in the trace where close is called for me:

Image

juliusknorr avatar Oct 23 '25 13:10 juliusknorr

This seems to be correlated with POST errors upon opening the file happening every few seconds.

Image Image

theLockesmith avatar Oct 23 '25 13:10 theLockesmith

Just now my breakpoint here stopped before hasConnectionIssues was altered. And indeed the last sync requests before happened only every 30 seconds. So probably they got throttled after no changes happened for some time.

Since messageReconnectTimeout in y-websocket.js is 30 seconds as well, I suspect a timing issue. When the sync requests happen only every 30 seconds and one of them takes a bit longer, the linked code path is touched and closes the "websocket connection" (which is no real websocket connection in our case).

mejo- avatar Oct 23 '25 14:10 mejo-

It's easy to reproduce by setting messageReconnectTimeout in y-websocket.js to 4000.

mejo- avatar Oct 23 '25 15:10 mejo-

What comes to my mind:

  • The websocket will close if it does not receive awareness updates in 30 seconds. y-websocket sends awareness updates itself every 15 seconds. So not seeing any for 30 makes it believe the connection is broken.
  • #7702 Maybe this is a side effect? Before we did not respond with steps to pushes that only contained an awareness message. Now we send the steps that arrived in the meantime. This means that the steps will not be included in the next sync response. So the polling backend is more likely to not receive any steps and think there is no activity going on.
  • Afaik we also do not send the latest awareness messages in a push response but only in a sync response. Otherwise we'd at least see our own awareness message bounce back like updates do.

max-nextcloud avatar Oct 23 '25 15:10 max-nextcloud

@max-nextcloud I guess the problem is that FETCH_INTERVAL_INVISIBLE in PollingBackend and messageReconnectTimeout in y-websocket are both 30 seconds. So probably we should either lower the former or raise the latter by five seconds, what do you think?

mejo- avatar Oct 23 '25 15:10 mejo-

what do you think?

@mejo- sounds good. I'd shorten the FETCH_INTERVAL_INVISIBLE - but both seem fine.

max-nextcloud avatar Oct 23 '25 15:10 max-nextcloud

I opened a PR with minimal changes, both lowering FETCH_INTERVAL_INVISIBLE and increasing messageReconnectTimeout to harden it until a better solution is implemented. https://github.com/nextcloud/text/pull/7822

benjaminfrueh avatar Oct 23 '25 17:10 benjaminfrueh

@benjaminfrueh and I were discussing to have a further improvement to automatically adapt the check interval with the one that is used to fetch to avoid browser throttling of background tabs to cause further issues.

juliusknorr avatar Oct 24 '25 11:10 juliusknorr

Can you detect / do you know the effect of browser throttling of background tabs? My understanding is that we intend to have a smaller interval (20sec) but are observing 1 minute. Do you know where this is coming from?

max-nextcloud avatar Oct 24 '25 15:10 max-nextcloud

I have the same message in app/files//files after around 40 seconds without even editing a file, could it be related?

baby-gnu avatar Nov 17 '25 13:11 baby-gnu

@baby-gnu Do you happen to have a folder description or a Readme.md in that folder? Does it also show in folders that do not have that?

max-nextcloud avatar Nov 17 '25 14:11 max-nextcloud

@max-nextcloud yes there is a Readme.md inside.

Meanwhile, I fixed an issue with time synchronization, the server was 10 minutes behind and things seems to get much better, I did not trigger the error message in app/files/files for minutes, neither in editing a file.

baby-gnu avatar Nov 18 '25 07:11 baby-gnu

@baby-gnu ohhh... that's an interesting issue you are bringing up. I don't think we account for out of sync server time.

max-nextcloud avatar Nov 18 '25 08:11 max-nextcloud

https://github.com/nextcloud/text/pull/8005

benjaminfrueh avatar Dec 03 '25 12:12 benjaminfrueh

I no longer experience these issues after #8005 got released. I'll close the issue for now, but feel free to reopen it if you think we further need to track this @max-nextcloud @benjaminfrueh.

mejo- avatar Dec 16 '25 11:12 mejo-