html
html copied to clipboard
Add steps to destroy documents that are ineligible for receiving message when posting message through broadcast channel
As discussed in https://github.com/whatwg/html/issues/7253, we should clearly spec that when posting message through broadcast channel, if the destination is not eligible for messaging, the associated document should be made unsalvageable. This PR add some more steps to perform this in the 9.5 Broadcasting to other browsing contexts section.
- [ ] At least two implementers are interested (and none opposed):
- Gecko: https://github.com/whatwg/html/issues/7253#issuecomment-1159096891
- Chromium: https://github.com/whatwg/html/issues/7253#issuecomment-1158634472
- [ ] Tests are written and can be reviewed and commented upon at:
- To be done with https://crbug.com/1420862
- [ ] Implementation bugs are filed:
- Chromium: https://crbug.com/1420862
cc @fergald @rakina @domenic @annevk @smaug---- @asutherland please help to take a look 🙇
/browsing-the-web.html ( diff ) /web-messaging.html ( diff )
@domenic could you take a look at this please and advise on how to land it?
@domenic could you take a look at this please and advise on how to land it?
Will do, but it will likely take a few days as my review backlog is quite large at the moment.
@annevk From my testing (see below) Mozilla and WebKit (desktop) both drop these if the receiver is in BFCache. Do you have a demontration page where they are queued and delivered?
- BroadcastChannel
- Chrome destroys (actually will not allow the page to enter BFCache but we want to relax this)
- FF/WebKit drops
My reading of the spec right now is that these should be dropped because they are only delivered to active documents. A big concern is that BroadcastChannel is used to log out all tabs when one tab logs out. Dropping these will result in logged-in state being restored from BFCache, which is a serious privacy problem
In general, an API that is reliable until BFCacheing occurs will lead to hard to reproduce bugs.
We're fine with queuing but it would be hard to spec limits on queue size. Leaving that implementation dependent is an option. @asutherland how do you feel about that?
If we queue them, I think the timing is a bit ambiguous. Should they run as soon as the document is active again, i.e. before visibilitychange and pageshow events? That's my fairly naive reading of the spec.
WebKit doesn't drop b/f cache entries based on BroadcastChannel (or any other Web APIs for that matter). We also do not prevent entering the b/f cache based on BroadcastChannel.
B/f cache entries are very valuable and I personally do not think it is a good idea to discard its entries based on something like BroadcastChannel messages being broadcasted.
Re: @fergald
We're fine with queuing but it would be https://github.com/whatwg/html/issues/7253#issuecomment-1159096891 limits on queue size. Leaving that implementation dependent is an option. @asutherland how do you feel about that?
I think we're eventually going to need to spec the size of structured serialization and storage actions, so I don't think we should view that as insurmountable (but I'm also unable to do the spec work at this time, so I won't presume we have that for this discussion).
A compromise if we are going to queue/buffer things is to set a minimum number of messages that will be buffered, like 3. @cdumez would webkit be interested in supporting buffering on a per-BroadcastChannel basis? (That is, each BroadcastChannel would have its own buffer as opposed to having a shared buffer across all BroadcastChannels associated with a global.) Buffering per-channel allows a page to partition the messages they really want reliable-ish transport for to happen on a rarely used channel, like for "log out" versus "spammy heartbeats".
The upper bound could be implementation dependent, which also makes sense because browsers are going to have implementation-dependent heuristics and underlying implementation costs that determine when a bfached thing is no longer worth its cost.
If we queue them, I think the timing is a bit ambiguous. Should they run as soon as the document is active again, i.e. before visibilitychange and pageshow events? That's my fairly naive reading of the spec.
It seems desirable that a page can unambiguously know if it's seeing buffered messages from BroadcastChannel (or other APIs; ex: some discussion re sessionStorage noting that storage APIs have it easier since they have a canonical backing store). It seems like there are some options for this:
- We add a flag to Events or their bufferable sub-classes that explicitly indicates if they're coming from buffer playback or not. It might be useful to note if the buffer overflowed or not, as one could imagine very complex pages could have a fast path and a slow path that varies on whether there was an overflow.
- We book-end the buffered events so that we schedule the "pageshow" event task, then all of the buffered tasks, then an "all-done-with-buffered-tasks" event task or a promise that we added to "pageshow" that resolves once all the tasks have been dispatched. This has the advantage of letting page logic defer actions until it knows it is caught up to realtime which seems important. I expect in the absence of this, pages may just use
setTimeout(0)which probably would work but limits implementation flexibility in the future. (Although I'm not sure of a situation where it would be desirable to let a freshly created timer interleave with the buffered event playback).
* FF/WebKit drops
Firefox doesn't drop messages, it removes the page from the BFCache if a message is sent to the open BroadcastChannel while in the BFCache.
From what I can tell by testing, WebKit keeps the page in the BFCache but drops the messages sent to the open BroadcastChannel while in the BFCache. @annevk, could you confirm that that's actually WebKit's behaviour? That seems quite weird as a behaviour to me. I think it makes more sense to either remove the page from the BFCache, or to queue the messages. We chose the first one because queueing is definitely more complicated (need to find a sensible limit, …).
Clarifications. When I said "drops" I meant drops the message. So the current state is
- Chrome: no bfcache with BC
- Mozilla: evicts if message is received while in BFCache
- WebKit: drops message
I thought I saw Mozilla behaving the same as WebKit but I cannot reproduce it.
To make life a little easier, I have created a WPT that sends a message to a page in BFCache. It is NOTRUN on Chrome and Mozilla (as expected). I don't have a way to test it on WebKit but I expect it will FAIL.
Can we agree on something before discussing details?
If we restore a page from BFCache it must either
- receive all BC messages or
- have opted-in to dropping messages and can find out when that happened
@cdumez @annevk That is not WebKit's current behaviour, so I would particularly like your response on this.
@petervanderbeken I think you already agree but an explicit response would be helpful.
@asutherland this may be slightly different to your position because it requires an explicit opt-in. This doesn't prevent queuing by default however it means that if the queue overflows, we evict, unless the page explicitly told us that it's OK to drop.
Can we agree on something before discussing details?
If we restore a page from BFCache it must either
- receive all BC messages or
- have opted-in to dropping messages and can find out when that happened
@cdumez @annevk That is not WebKit's current behaviour, so I would particularly like your response on this.
I am not opposed to: receive all BC messages when restoring a page from the b/f cache (probably up to a certain limit) I thought this was what we implemented but it wouldn't surprise me if we were dropping messages instead.
Either way, we haven't noticed yet any breakage from our current behavior.
@petervanderbeken I think you already agree but an explicit response would be helpful.
I discussed it a bit with @smaug----, we still prefer no queueing. Queueing is complicated to spec and implement, ideally you'd then want to specify the limits (taking into account message sizes too). We already have compat issues around limits of just the length of the session history itself for example.
So our position would be: no queueing, evict on message with an opt-in for dropping messages instead.
@petervanderbeken Do we need to spec the details of queueing?
I think you and I agree there should be only 2 outcomes:
- document is restored from BFCache without missing messages
- document is reloaded
To achieve that, we can just say that the messages are queued and delivered if the document becomes active again. If a browser wants to evict with no queueing they are in spec because the document never becomes active again.
Pinging this. Does anyone object to speccing the following?
- documents with an open BroadcastChannel can go into BFCace
- documents reactivated from BFCache should receive all the messages that would have arrived
- (doesn't really need to be said but for clarity here) browsers are free to destroy the document and evict the BFCache entry if they want to e.g. if the queue of messages gets too big
We're ok with that proposal. We will probably keep our current behaviour at first, which is essentially a message queue length of 0, but the third bullet point allows for that.
I think the main question is if WebKit is interested in changing their behaviour of dropping messages.
That depends on how often it would mean a cache eviction where currently there is none. We don't really want to regress on that as we haven't seen issues with our current behavior.
@annevk
That depends on how often it would mean a cache eviction where currently there is none. We don't really want to regress on that as we haven't seen issues with our current behavior.
Less than 0.4% of history navigations in Chrome have a BC open. I do not know what fraction of that would receive messages.
I can't find any discussion or documentation of BC being unreliable beyond this thread. Before #7296, the spec seemed to say that messages should always be delivered. I don't think devs are aware that it is not reliable, especially as it is reliable on Firefox and Chrome. Pages using BC as a heartbeat will be fine but sites using BC to send incremental updates are definitely broken by this probably in ways that are very hard to trace back to BC+BFCache.
I also don't know what guidance we could give to sites that want a reliable BC. Rebuilding it from other APIs seems hard. If you don't care about reliability but do care about BFCache, closing your BCs in pagehide is easy.
We discussed this in the triage call (#9308) today. Summary:
- WebKit would be open to a queue with an implementation-defined limit while in bfcache, with implementation-defined behavior when the limit is reached. Others on the call were not excited about the implementation-defined post-limit behavior.
- To reach our standard of "2+ implementers support, no strong implementer objections" on requiring eviction when the limit is reached, @annevk would need to check with the rest of the WebKit team as to whether they would strongly object to evicting on exceeding a limit.
- What would help here is more-specific numbers, which @fergald could help collect, e.g.: what percent of back navs have a BC channel open and receive 1+ messages, 2+ messages, N+ messages, etc. (Or maybe in KiB...)
- I pointed out that if WebKit sets a limit on the same order of magnitude as "a bfcached web page", it's very unlikely that limit will ever be reached.
- There was some discussion of how this is similar to, or not similar to, storage events. It seems like previously the decision was to not fire any storage events for bfcached pages: https://github.com/whatwg/storage/issues/119#issuecomment-1115844532 although this was never reflected into the spec. @annevk would be comfortable with dropping all BroadcastChannel messages for bfcached pages in this same manner, but others on the call were not excited about making
BroadcastChannela lossy communications channel. - In general @annevk was skeptical that sites using
BroadcastChannelwould correctly respond to queued-and-replayed messages after being taken out of bfcache. Examples of such sites, which would break their functionality with dropped messages but work correctly with replayed messages (especially 2+ replayed messages) would be helpful.
I don't have data on the fraction of restores that would be impacted by a queue size of N. Gathering that would require implementing this change. I also can't test that nothing fails with queueing because that requires implementing queueing, I can only look at the code.
In general in the examples below, I cannot see a reason why anything would end up in an incorrect state if messages were delivered after restore. I can see how some cases may result in a lot of work being performed if multiple events are delivered.
There is no timing guarantee on BC messages or guarantees of ordering with respect to other message channels. So I would say that if an app cannot correctly receive 2+ messages on restore, then it was already buggy without BFCache. Although for some apps, the day-to-day likelihood of hitting such a bug could be zero.
If we believe that 2+ messages is commonly problematic, that seems like an argument for a max queue size of 1.
Examples
TL;DR
- it's easy to find things that will break with dropped messages, including some widely used code (Firebase)
- there are some examples that looked like they would be OK with dropped messages
- I couldn't find any that would obviously break with a queue
- replaying a large queue on restore will be a performance problem for some pages
Sources
I got most of these from https://snyk.io/advisor/npm-package/broadcast-channel/example. This is basically a polyfill for BroadcastChannel, defaulting to native BroadcastChannel. In the first 5 examples, 4 look like a problem and 1 looked like it was using BroadcastChannel to broadcast within the same document, so would not be a problem. I don't know how popular these apps/packages are.
The Firebase example comes from searching httparchive, it is very widely used. I found it very hard to uncover other understandable usages of BC via httparchive, so I stopped.
I also searched Google's internal source but it's hard for me to know what's launched/public and everything gets minified and finding the public version of the JS is a chore, so I'm not sharing any examples from there but there were a couple that looked problematic.
Telegram
TDWeb is telegram.org's library for building web clients. BroadcastChannel is used in the closeOtherClients method. It appears that this is called whenever a client is constructed, I suppose it ensures that you only have 1 live client at a time. This would fail if there was a client in BFCache and the message was dropped. It seems like it would be fine to deliver 2+ of these.
Firebase
https://cs.opensource.google/firebase-sdk/firebase-js-sdk/+/master:packages/installations/src/helpers/fid-changed.ts;l=97?q=%22new%20BroadcastChannel%22 broadcasts changes to the FID which I believe is the user-id. A tab that misses this broadcast will remain on the old user ID. This sounds bad if all tabs are supposed to be on the same FID. Again, I don't see why delivering 2+ of these would break anything but if dealing with them is costly, it could result in jank.
RXDB
RXDB is a JS database that uses BroadcastChannel to make sure all tabs using the DB are in sync. That's going to fail if messages are dropped. I don't see any reason for it to fail if messages are delivered after BFCache restore.
Lektor
Lektor is some kind of CMS. It sends a signal to reload all open tabs when changes are made. This signal comes to a worker which then sends it via BroadcastChannel. A tab coming back from BFCache would stay out of sync. Delivering 2+ of these seems like it would not break, just reload excessively.
Hi @annevk , from https://github.com/whatwg/html/pull/8972#issuecomment-1571607739 , it looks like we have reached an agreement that mentioning the queueing behavior with a implementation defined capacity is acceptable. We have made some updates ub this pull request, could you help to take a look and confirm if there is any objection from WebKit side?
Also does @fergald 's examples in https://github.com/whatwg/html/pull/8972#issuecomment-1592234471 resolved your concern about if sites are able to handle dropped messages? Do you need us to collect any more data to support the proposal?
Thanks.
I feel like there's still no significant data either way that WebKit's existing behavior is problematic. And while we're open to adjusting it in terms of attempting to replay messages, we'd probably still err on the side of wanting to revive a document over replaying its messages. When forced to make a choice between those that is.
Getting back to this as we have someone working on it again.
I feel like there's still no significant data either way that WebKit's existing behavior is problematic. And while we're open to adjusting it in terms of attempting to replay messages, we'd probably still err on the side of wanting to revive a document over replaying its messages. When forced to make a choice between those that is.
I've linked to code that definitely breaks, the firebase code will leave a page in the logged-in state when it should be logged out. Nobody has ever debugged this hard-to-repro problem to the point of filing a bug against Safari but the problem is still there.
If we adopt this behaviour then we need to document that BroadcastChannel is not a reliable communication channel.
I am curious what else Safari does. Does it drop or deliver storage events that occur while in BFCache?
I modified the description and the links has been updated by the bot.
I was asked to review this but I'm unclear if it reflects (my understanding of) the current cross-browser consensus.
The spec PR evicts immediately on any messages being delivered to a bfcached page.
I guess we can assume this is Chromium's desired behavior?
For Gecko, https://github.com/whatwg/html/pull/8972#issuecomment-1459876348 says
So our position would be: no queueing, evict on message with an opt-in for dropping messages instead.
I don't see any opt-in for dropping messages in this PR, but, maybe the idea is to add that later?
But then the discussion moved to @fergald's proposal at https://github.com/whatwg/html/pull/8972#issuecomment-1512292547, reiterated by me in https://github.com/whatwg/html/pull/8972#issuecomment-1571607739, which includes an implementation-defined queue size limit. And @lozy219 says in https://github.com/whatwg/html/pull/8972#issuecomment-1730905420 that he seems to be working on an implementaiton-defined capacity.
I don't see any such capacity limit in this PR. Instead I see what is effectively a capacity limit of 0.
Maybe Gecko and Chromium both agree to a capacity limit of 0? If so I'd like to get Gecko to reaffirm that explicitly. @petervanderbeken @smaug----
With regards to WebKit's position, my read is that their behavior (silently dropping messages while in bfcache) is not spec-conformant either before or after this change. Before this change, the spec says to queue all messages or (implicitly) evict; after this change the spec says to explicitly evict.
So, I think it would be reasonable to merge this if Gecko + Chromium are in favor of it, and WebKit can continue to advocate for a different behavior if they would like the spec to match their implementation where BroadcastChannel becomes unreliable. (Ideally by addressing the data @fergald gathered.)
With regards to WebKit's position, my read is that their behavior (silently dropping messages while in bfcache) is not spec-conformant either before or after this change. Before this change, the spec says to queue all messages or (implicitly) evict; after this change the spec says to explicitly evict.
So, I think it would be reasonable to merge this if Gecko + Chromium are in favor of it, and WebKit can continue to advocate for a different behavior if they would like the spec to match their implementation where
BroadcastChannelbecomes unreliable. (Ideally by addressing the data @fergald gathered.)
Sorry, this is wrong. I missed a line in the diff. WebKit's behavior matches the current spec, which says to drop the messages and makes BroadcastChannel unreliable.
So, if Gecko also prefers the evict-with-0-buffer behavior, then we have a more clear conflict of Chromium + Gecko preferring one behavior, and the existing spec + WebKit preferring another.
In that case, I don't think it's reasonable to merge this without getting clearer signals from WebKit, per the considerations in https://github.com/whatwg/html/pull/8972#issuecomment-1571607739.
For the record, we did not try to measure what would happen if we queued. Implementing the non-queueing version was easy and results in a negligible amount of blocking. We do not plan to do any further work to measure what level of blocking would occur if we queued one message. We could re-propose the queuing version of the spec and state that our queue has a length of zero but if no implementor has an appetite for any value greater than zero, it doesn't make sense.