realtime-js icon indicating copy to clipboard operation
realtime-js copied to clipboard

Realtime websocket loses connection regularly when browser tab goes to background

Open GaryAustin1 opened this issue 3 years ago • 87 comments

Bug report

Describe the bug

With a simple subscription to a table (RLS or Not) the websocket drops and reconnects every 3 minutes after 5 minutes of tab in background. Verifying on Windows 10 with Chrome and Edge. Note going to different tab on browser puts other tabs in background mode.

To Reproduce

var supabase = supabase.createClient("https://url.supabase.co", "auth code") let mySubscription = supabase .from('status') .on('*', payload => { console.log('Change received!', payload) }) .subscribe((status,e)=>{ console.log('subscribe status,error',status,e); })

Start up subscription to a table. Start Dev Console turn on networking and timestamps. Shrink browser or hide tab under others. Wait >10 minutes. Websocket will have disconnected and connected several times.

This is a pretty bad condition for real-time if it is to be relied on at all for more than the current active window time. During each disconnect changes to the table would be missed.

It is also compounded that Supabase does not really document error and close handlers for realtime subscription at supabase.js or higher. Only here in the realtime.js description do you see error handlers. It took looking at the supabase.js code to see that one could do .subscribe((status,error)=>{console.log(status,error)} to see connection failures. OTHERWISE THIS IS A SILENT FAILURE AS THE SUBSCRIPTION KEEPS RUNNING EVEN THOUGH DROPPING POTENTIAL UPDATES. Anytime realtime drops the socket, if you want reliable data you have to reload from your tables to get all recent changes.

Expected behavior

The websocket should remain alive even when tab/window is hidden and in background mode. Realtime.js must understand different timings done by background throttle of browsers and keep connection alive.

Screenshots

Below is about 15 minutes after leaving browser in background...

image image

System information

Windows 10 supabase.js 1.28.5 Chrome or Edge

Additional context

Out of time for a few days but I'll try and get more info on these additional issues. EDIT: I no longer believe there is an error with multiple connections for a single user. Further investigation posted in my next issue points to a more general realtime.js bug on losing the refreshed token on a disconnect/reconnect of the websocket for any reason (the above being one).

GaryAustin1 avatar Dec 08 '21 14:12 GaryAustin1

Just want to add it is understood by me, and should be understood or made clear to users of real-time, the need to monitor the subscribe and take appropriate action if the subscription fails. The reason I dug into how to watch the subscribe status is I use it as part of my online/offline logic and reload message counts when coming back online. But my overhead is such that doing this with a background tab every 3 minutes is very costly. I’ve already started down the path to use the visibility hook to not reload if hidden but it will still be painful if the user is just moving thru tabs on his screen and goes away fo 10 minutes.

GaryAustin1 avatar Dec 08 '21 22:12 GaryAustin1

Looking into this more I see realtime uses websocket-node. I'm hopeful there are some settings that can deal with the background tab throttle giving client more time to ping. I've found no discussions on that particular package but here are others that have had to deal with this: https://socket.io/blog/engine-io-4-release/#heartbeat-mechanism-reversal https://stackoverflow.com/questions/66496547/signalr-and-or-timer-issues-since-chrome-88 https://github.com/SignalR/SignalR/issues/4536

GaryAustin1 avatar Dec 09 '21 03:12 GaryAustin1

I see that you're using Phoenix. We're having similar issues with the websocket disconnecting after a while – mostly in an inactive browser tab.

fschoenfeldt avatar Dec 10 '21 09:12 fschoenfeldt

Adding this comment here also on Chrome and going to 1 minute timeouts (which would break the 60 sec heartbeat).

OK, here's the new bit in Chrome 88. Intensive throttling happens to timers that are scheduled when none of the minimal throttling or throttling conditions apply, and all of the following conditions are true: The page has been hidden for more than 5 minutes. The chain count is 5 or greater. The page has been silent for at least 30 seconds. WebRTC is not in use. In this case, the browser will check timers in this group once per minute. Similar to before, this means timers will batch together in these minute-by-minute checks.

https://developer.chrome.com/blog/timer-throttling-in-chrome-88/

GaryAustin1 avatar Dec 11 '21 15:12 GaryAustin1

Not directly related to the above issue (which is still there).

@w3b6x9 I'm not sure who at Supabase is the knowledge keeper of the phoenix realtime socket level stuff...

But I think it is important to understand what level of interruption a websocket can have and still have the server maintain an output queue of realtime changes to that socket. For instance if the socket drops one heartbeat and then the reconnect process starts is the "channel" queue on the server still there and taking in realtime changes that can then be fed back down to the client on reconnect?
If so how long can the websocket not have connection before the server drops the channel?

This is important in determining when to reload a dataset because changes could have been missed. Thanks

GaryAustin1 avatar Jan 19 '22 00:01 GaryAustin1

@GaryAustin1 if the server doesn't receive a heartbeat for 60 seconds (this is the default for all Supabase projects running Realtime RLS but can be customized) then it will sever the socket connection. When the socket connection is severed, all of the channel processes (every topic that a client was listening to via the socket conn) dies. On reconnect, all of the topics that the client is listening to will join as new channel processes on the server.

w3b6x9 avatar Jan 20 '22 01:01 w3b6x9

I've been looking into this a bit. Unfortunately without any sort of short term queue I believe Supabase needs to make clear that when a tab is not in focus, you need to stop realtime, and plan on restarting completely on focus using the visibility event. A tab not in focus can mean just a simple go check stocks on another tab and come back. Based on further research a queue would probably only benefit desktop browsers chrome/edge. Firefox is the only browser that has no issues on desktop. I've not tested on mobile yet.

Firebase does not have these issues with their realtime as they maintain a constant copy of your query and restore behind the scenes when your tab comes back. I believe they might also use webworkers to do this, but not sure.

I was hoping for some way to at least use the 5 minutes Chrome/Edge have before they throttle timers (which kills heartbeat) to keep from reloading if someone just leaves your realtime app/tab for a minute to look at something else, but it appears that only works on desktop. On mobile devices both android with chrome and iPad with safari the results range from lost connection within a minute to 5 minutes depending on powered or battery and only tab or background tab. There appears to be no way to get consistent time before loss of connection across multiple devices and browsers.

Here are some sample traces. Note anytime there is a connect error you could lose data and in some cases the device just stops running the tab and a few old realtime events get logged and then loss of data until device/tab comes back into focus. All of these have a program running generating a constant incrementing update count, so should always be +1 if no data loss.

Edge(Chrome is similar) on windows desktop showing pretty constant visibility event + 5 minutes to failure edge-windows background tab Edge (Chrome is similar) on windows desktop showing data loss edge-windows background

Android/Chrome powered (chrome has a freeze event in addition to visibility but does not seem reliable) Loss of data from freeze to visible image

iPad battery front tab going to sleep image

Then I have traces where "strange" things happen like this android/chrome one... no idea why there are no retries here and code gets to keep running. image

GaryAustin1 avatar Feb 22 '22 02:02 GaryAustin1

@w3b6x9 Can you take a look at this discussion (in particular the flowchart and then the code in the bottom reply) when you get a chance (no hurry) and comment.
https://github.com/supabase/supabase/discussions/5641 I've been trying to come up with a straight forward way to handle realtime across all connection failures in a reasonably performant manner. So far limited testing is good on desktop, starting on mobile devices soon.

Unfortunately the code is somewhat custom for each case of a subscription depending on table size (do you want just last x entries or whole table), the id column for keeping the "table" array updated, inserts at beginning or end and filter. I tried to push that code into an update handler.

GaryAustin1 avatar Mar 04 '22 02:03 GaryAustin1

EDIT: Add link https://github.com/supabase/supabase/discussions/5641

GaryAustin1 avatar Mar 04 '22 02:03 GaryAustin1

This hasn't been solved yet ?

xxholyChalicexx avatar Mar 29 '22 10:03 xxholyChalicexx

@xxholyChalicexx we have not visited this issue yet but will investigate in the near future. Is this currently blocking you? Have you come up with alternatives?

w3b6x9 avatar May 19 '22 23:05 w3b6x9

well its not blocking but it gets annoying at times. As of now i try to catch and reconnect, i do that so that too is hit and a miss. Thankfully it doesnt have any adverse effect so for now just making it work.

xxholyChalicexx avatar May 20 '22 06:05 xxholyChalicexx

I just wanted to check in and see if there was any movement here?!

zbennett avatar Aug 12 '22 18:08 zbennett

We'll be implementing a solution for this in the next few weeks. realtime-js heavily draws from phoenix-js and they have already implemented a solution for this: https://github.com/phoenixframework/phoenix/blob/bf1f2bfc9392c515081b1614df1b507f2c120fde/assets/js/phoenix/socket.js#L119. We'll be adopting that solution.

w3b6x9 avatar Sep 26 '22 23:09 w3b6x9

Has the solution been implemented?

zamorai avatar Nov 07 '22 04:11 zamorai

Interested in this as well because I'm also seeing disconnects on inactive tabs and it causes the client to not receive changes from the DB which in turn places the client out of sync with the rest of the participants

netgfx avatar Nov 28 '22 12:11 netgfx

@netgfx, This is just one of the reasons the connection can be dropped (loss of signal, mobile power savings are at least two others). You need to have code in place to capture the disconnects and restart the process (including loading any old data you need) on every disconnect. Although realtime will in many cases reconnect, any data changes during that process are lost.

The flow chart here might be a bit dated https://github.com/supabase/supabase/discussions/5641 but shows a general idea. Also another user has generated this (I've not used it): https://code.build/p/GZ6ioN6YzcpDwNwGNnDpEn/supabase-subscriptions-just-got-easier

GaryAustin1 avatar Nov 28 '22 14:11 GaryAustin1

I have the same problem as OP and want to express my support for fixing this issue. My current workaround is to reload data from scratch on every successful SUBSCRIBED event. But they happen constantly when tab is in background so my server gets overwhelmed with reload requests.

lhermann avatar Feb 20 '23 04:02 lhermann

Struggling with the same issue. I thought this was a rather easy fix?

alex1s1 avatar Apr 03 '23 12:04 alex1s1

I'm currently having this issue. I'm using the NextJs clients and tried to play around with the realtime options e.g. timeout and heartbeatIntervalMs but they dont seem to have any effect

eliasm307 avatar Aug 01 '23 23:08 eliasm307

The problem is the timers get throttled by the browser and not much works that relies on timer, if it is the background or powerdown mode of a mobile device. You pretty much need to shut it down and wait for visibility to come back and restart everything. IMO. https://github.com/orgs/supabase/discussions/5641 and https://github.com/GaryAustin1/Realtime2 have some more info and ideas, but the first is a bit dated. The 2nd not complete.

GaryAustin1 avatar Aug 01 '23 23:08 GaryAustin1

Could the connection be monitored by a web worker? That would solve the throttle or backgrounded tab issue

netgfx avatar Aug 02 '23 05:08 netgfx

i have the same problem. With latest version of Chrome on win11, the channel-connection dies and no more updates are received

claudio-bizzotto-zupit avatar Aug 08 '23 10:08 claudio-bizzotto-zupit

same issue here

ioRekz avatar Sep 25 '23 14:09 ioRekz

w3b6x9

@w3b6x9 Is this solution implemented? I am still seeing the disconnection issue when the tab is not used for a while unfortunately..

Thimows avatar Apr 08 '24 14:04 Thimows

We've been struggling with same issue for a while and have found a solution that's working for us by disconnecting from the realtime channel whenever the tab is hidden and reconnecting when the tab is visible. This avoids the core problem of the heartbeat dying in the background and means we then only need to handle the connect/disconnect graciously.

Here's how we are doing it (app is in Svelte):

	const channelName = `channel.id`;
	channel = $supabaseClient
		.channel(channelName)
  	.subscribe(async (status) => 
   	switch (status) {
    		case 'SUBSCRIBED':
   			await channel.track({ user: $user });
   			// Checks if a notification was sent for a connection error, sends a new notification to update the user. Doesn't send if everything is alright
   			if (channelState === 'error') {
    				$createRealtimeNotification = {
   					id: channelName,
   					type: 'success',
   					action: () => null
    				};
   			}
   			channelState = 'connected';
   			// Drives app logic that the connection is valid
   			connected = true;
   			break;
    		case 'TIMED_OUT':
    		case 'CHANNEL_ERROR':
   			if (!connected) {
   			  // Update state to drive logic on reconnect
    				channelState = 'error';
    				// Send message notifying the channel is disconnected
    				$createRealtimeNotification = {
   					id: channelName,
   					type: 'error',
   					action: () => null
    				};
   			}
   			connected = false;
    		case 'CLOSED':
    		default:
   			connected = false;
   	}
    
   	if (!connected) {
   	// If disconnected reload all server functions
    		invalidateAll();
   	}
  });
};

function reconnectOnTabChange() {
	if (!document.hidden) {
		refreshSubscription();
	} else {
		channel.unsubscribe();
	}
}

onMount(() => {
	refreshSubscription();

	document.addEventListener('visibilitychange', reconnectOnTabChange);

	return () => {
		channel.unsubscribe();
		document.removeEventListener('visibilitychange', reconnectOnTabChange);
	};
});```
 

vfatia avatar Apr 25 '24 05:04 vfatia

the above seems like a good solution. If the disconnection is an issue due to browser pausing the tab then perhaps this operation (to keep the connection alive) should be offloaded to a webworker that is never paused by default 🤔

netgfx avatar Apr 25 '24 06:04 netgfx

We'll be implementing a solution for this in the next few weeks. realtime-js heavily draws from phoenix-js and they have already implemented a solution for this: https://github.com/phoenixframework/phoenix/blob/bf1f2bfc9392c515081b1614df1b507f2c120fde/assets/js/phoenix/socket.js#L119. We'll be adopting that solution.

Is there any progress on this solution?

Nishchit14 avatar May 03 '24 15:05 Nishchit14

Hello! Any updates regarding this?

alexnechifor avatar Jun 06 '24 13:06 alexnechifor

not yet, will move this up in the list of priorities

filipecabaco avatar Jun 21 '24 13:06 filipecabaco