webdriver Make `list of known elements` per Window, not per browsing context

As discussed with @jgraham the support for stale element reference in WebDriver HTTP should be considered to be removed. There is basically no strong reason why no such element cannot handle such situations eg. after a navigation. For WebDriver BiDi it looks like not be used at all.

We are aware that the removal of this error type is backward incompatible. As such we would like to get a discussion started to know:

if the removal as suggested is acceptable for everyone
how often this error type might actually be used in the wild
The strategy to get it removed

Maybe it can be discussed in the next monthly meeting.

CC'ing @AutomatedTester, @shs96c, @JohnChen0, @burg, @christian-bromann

May 27 '21 18:05 whimboo

Some background: We current have to produce the Stale Element Reference error if an element is in a different document, or if it's not connected. But that's annoying to implement. It means that you can't just have a per-window weak map between element ids and elements, because the spec requires you to figure out that the reference is valid for a DOM node that's in another document, or for a DOM node that's been removed from the DOM (and might only be kept alive by the List of Known Elements in WebDriver).

Particuarly with heavily multiprocess architectures in modern browsers, the stale element reference distinction alone requires an implementation to keep at least a list of all known element references in the main process. Without this implementations could be significantly simplified. It seems unlikely that users are strongly depending on the distinction between no such element and stale element reference, so it seems reasonable to just always return no such element. If we can't change HTTP WebDriver here, I at least plan to make BiDi not have this distinction and force HTTP-based WebDriver to do the bookkeeping on which element references have ben used.

May 27 '21 18:05 jgraham

We don't need to keep a map of all element references, just ones that have been previously searched for in the past. This is why it is described as get known elements. Historically, Marionette used to have this at a frame level which should work with multiprocess architectures, assuming that each frame gets a new process.

How are we we planning on handling the situation where someone has a reference to a known element and it is no longer attached to the DOM? Are we going to say the element you found a while back is now NoSuchElement?

May 27 '21 20:05 AutomatedTester

Hmm, so I re-read the relevant spec bits, and I think maybe we'd misinterpreted the scope of the staleness checks. As long as is stale is always called after you've already failed with no such element for elements from a different browsing context then it's probably true that the cache can be per-process. But the spec here is a real mess; as far as I can tell it's always wrong to check for staleness except in "get a known element", and I'm not seeing an observable difference between "get a known element" and "get a known connected element".

In any case the staleness check at least requires that you keep a record of all the element ids you ever handed out for a specific browsing context (assuming the intent is that you get stale element reference for an element that has been gc'd; the list of known elements construct at least implies that).

Maybe it would be enough to change the spec to not hold a strong reference to elements, so that if the element is gc'd you get no such element and stale element reference is only for the case that the element is disconnected or the page navigated. Even better it could just be for the disconnected case, so after navigation you always get no such element, even if the browsing context hasn't been discarded. Then the implementation could just be a weakmap of element id to element, and the staleness check would just be checking the document of the element. That seems a lot closer to the way that BiDi is likely to work (given we will have all sorts of remote objects there, not just elements, and we don't want to extend the lifetime of the object ids past the lifetime of the global).

May 27 '21 20:05 jgraham

From the framework perspective it really depends if the client keeps information about whether it has fetched an element id before. For example WebdriverIO has an element instance for users to call element commands on, e.g.:

const elem = $('#someId') // fetch element
elem.click()
// arbitrary re-render happened
elem.click() // could be stale element now, but would be refetched if so

This element (elem) instance keeps various information like the original selector to ensure it can be refetched. There is also an implicit check for stale element errors to refetch the element with the same selector.

For implementations that don't have such safe guard it could be more confusing for users who interact with an element first and then suddenly receive a no such element error, particularly given the fact that staleness can sometimes be triggered based of arbitrary re-render events of the underlying frontend framework. So e.g. if we take the webdriver package that WebdriverIO uses for making low level WebDriver requests, it would look different given no safe guards are in place:

const searchInput = await client.findElement('css selector', '#lst-ib')
await client.elementClick(searchInput['element-6066-11e4-a52e-4f735466cecf'])
// arbitrary re-render happened
await client.elementClick(searchInput['element-6066-11e4-a52e-4f735466cecf']) // suddenly "no such element"

Overall I believe the biggest advantage to have a stale element error was to give the devil a name. Such errors are still happening and are reasons for non deterministic test results. For implementations that work with safe guards it probably won't make a difference but for the ones that don't have such functionality such non deterministic behavior will be even harder to explain.

May 28 '21 07:05 christian-bromann

Hmm, so I re-read the relevant spec bits, and I think maybe we'd misinterpreted the scope of the staleness checks. As long as is stale is always called after you've already failed with no such element for elements from a different browsing context then it's probably true that the cache can be per-process. But the spec here is a real mess; as far as I can tell it's always wrong to check for staleness except in "get a known element", and I'm not seeing an observable difference between "get a known element" and "get a known connected element".

There probably isn't an observable difference.

In any case the staleness check at least requires that you keep a record of all the element ids you ever handed out for a specific browsing context (assuming the intent is that you get stale element reference for an element that has been gc'd; the list of known elements construct at least implies that).

In theory we don't need to have a map in the browser for this. If a user passes us an element ID and we are not aware of it or can't find it on the page we are always going to pass back a StaleElementReferenceException

Maybe it would be enough to change the spec to not hold a strong reference to elements, so that if the element is gc'd you get no such element and stale element reference is only for the case that the element is disconnected or the page navigated. Even better it could just be for the disconnected case, so after navigation you always get no such element, even if the browsing context hasn't been discarded. Then the implementation could just be a weakmap of element id to element, and the staleness check would just be checking the document of the element. That seems a lot closer to the way that BiDi is likely to work (given we will have all sorts of remote objects there, not just elements, and we don't want to extend the lifetime of the object ids past the lifetime of the global).

The thing that I don't want, from my first reply and echo'ed by @christian-bromann, is that we don't have a situation where we pass in an element reference and get "NoSuchElementException" as there is already "reflexive" knowledge here.

May 28 '21 10:05 AutomatedTester

Let's consider some examples. What do people expect in the following scenarios, assuming a page like <div id=test></div>? In the navigation cases, do your answers depend on whether the page is in the bfcache?

elem = webdriver.findElement("#test")
webdriver.executeScript("""
  let elem = document.querySelector("#test");
  window.strongReference = elem;
  elem.parentNode.removeChild(elem);
""");
webdriver.getElementProperty(elem, "id");

elem = webdriver.findElement("#test")
webdriver.executeScript("""
  let elem = document.querySelector("#test");
  elem.parentNode.removeChild(elem);
""");
// Assume a gc has run in the browser
webdriver.getElementProperty(elem, "id");

elem = webdriver.findElement("#test")
webdriver.go("about:blank")
webdriver.back()
webdriver.getElementProperty(elem, "id");

elem = webdriver.findElement("#test")
webdriver.executeScript("""
  let elem = document.querySelector("#test");
  window.strongReference = elem;
  elem.parentNode.removeChild(elem);
""");
webdriver.go("about:blank")
webdriver.back()
webdriver.getElementProperty(elem, "id");

elem = webdriver.findElement("#test")
webdriver.executeScript("""
  let elem = document.querySelector("#test");
  elem.parentNode.removeChild(elem);
""");
// Assume a gc has run in the browser
webdriver.go("about:blank")
webdriver.back()
webdriver.getElementProperty(elem, "id");

elem = webdriver.findElement("#test")
webdriver.executeScript("""
  let elem = document.querySelector("#test");
  elem.parentNode.removeChild(elem);
""");
// Assume a gc has run in the browser
webdriver.go("about:blank")
webdriver.back()
webdriver.setWindow(otherValidWindowHandle)
webdriver.getElementProperty(elem, "id");

May 28 '21 11:05 jgraham

Local ends use this exception to allow them to tell the difference between "the user has just searched for something that doesn't exist" and "this element used to exist, and now it is gone". The two cases are different, and being able to differentiate them is useful.

The thing that I don't want, from my first reply and echo'ed by @christian-bromann, is that we don't have a situation where we pass in an element reference and get "NoSuchElementException" as there is already "reflexive" knowledge here.

+1 to this.

Jun 08 '21 15:06 shs96c

Sure, although in theory at least that's something that local ends can know themselves (they would keep a set of all the element references that have previously been returned and check if the element reference is a member on failure). Of course there can be advantages in putting this in the driver, but I'm loathe to reintroduce the complexity into BiDi; it might be a thing that's (spec-wise) handled in the HTTP frontend. But, to understand what the complexity is supposed to be it would really help to get answers to the lifetime questions above.

Jun 08 '21 19:06 jgraham

The Browser Testing and Tools Working Group just discussed Stale Element Reference.

The full IRC log of that discussion

<AutomatedTester> topic: Stale Element Reference
<AutomatedTester> github: https://github.com/w3c/webdriver/issues/1594
<AutomatedTester> jgraham (IRC): this is a question that is for all webdriver
<AutomatedTester> ... I understand there was a distinction in the past
<simonstewart> q+
<AutomatedTester> ... the question is what is the expected lifetime...
<AutomatedTester> ack simonstewart
<AutomatedTester> simonstewart (IRC): The original plan was we kept a cache in the window global.
<AutomatedTester> ... if you navigate away and come back it would nuke our cache
<AutomatedTester> ... in watir and that they will do a lookup again but selenium doesnt do that and assumes the user knows what is the best wayt to handle this
<AutomatedTester> jgraham (IRC): i think that is good
<AutomatedTester> ... I think we need to read the webdriver spec and make sure it does what you have just said
<AutomatedTester> ... my inclination is to not have something similar in bidi
<simonstewart> q+
<AutomatedTester> ... and that in bidi we have more complex objects that are returned
<AutomatedTester> ... and they can have large amounts of data that may get stored in a cache
<AutomatedTester> ack simonstewart
<AutomatedTester> simonstewart (IRC): I think it gets very interested... in webdriver http we required that they keep it instead of the client
<AutomatedTester> ... as the browser would know the lifecycles better than a client
<AutomatedTester> jgraham (IRC): my concern is if we return a handle and then it is gc'ed and someone tries to use it and it could return a differrent error
<AutomatedTester> simonstewart (IRC): is this due to JS serialisation
<AutomatedTester> jgraham (IRC): maybe... [discusses]
<AutomatedTester> simonstewart (IRC): in webdriver http we essentially wanted something that could be returned that passed going through `json.stringify`
<AutomatedTester> jgraham (IRC): [discusses JS execution and handles]
<AutomatedTester> simonstewart (IRC): the original plan was due to things that lives in the DOM
<AutomatedTester> jgraham (IRC): that leads to the question about what type of references we need to keep
<AutomatedTester> ... one thing that I think is true in http spec but not bidi is...
<AutomatedTester> ... if you disconnect an element from the DOM but it's still alive then we would return staleelement reference
<AutomatedTester> ... but in JS the handle would still be held
<AutomatedTester> ... [missed a few items around usage]
<AutomatedTester> action: Verify Webdriver HTTP spec matches the requirements and clarify where necessary

Jun 09 '21 16:06 css-meeting-bot

At first I was annoyed by the change in implementation for returning "Not Found" instead of "Stale" when working inside a different frame, because Watir removes the need to explicitly switch between frames. Except it ended up being really easy to manage.

From an implementation standpoint there isn't any difference as to what the error is.

If a user sends a command along with an element reference and it comes back "Not Found" instead of "Stale," we can just rescue it and throw a Stale error to stay backward compatible.

I'm not sure I understand what the reflexive knowledge concern is.

Jun 10 '21 22:06 titusfortner

The general tl;dr of the telecon discussion is that the element reference cache only needs to live as long as the page (from the point of view of modern browser architecture, this means it needs to live in the content/render process, not the UI/parent process). I think this removes the main technical concern from our side, but we need to audit the spec to ensure that it does in fact have the behaviour we want.

Jun 11 '21 12:06 jgraham

Based on my understanding of where we're at with this issue, I've updated the issue summary.

If we make list of known elements per Window rather than per browsing context, it means that once you navigate away from a page trying to use an element reference you got from that page will give no such element rather than stale element reference. This allows implementations to tie the element id cache to the Window lifetime, and avoids having to store all previous element ids indefinitely, including after navigation. If you go back() to the original page, and it was in the bfcache, the original element references will still work (as before), and any element that's been removed will give stale element reference (as before). If you back() and the page is reloaded (i.e it wasn't in bfcache) you'd now get no such element if you try to reuse an existing reference, whereas per the current spec you'd get stale element reference.

In the case that the navigation creates a new browsing context group (i.e. there's a COOP header set) the old and new behaviours will be the same.

This never affects whether a reference is valid, it only affects the specific exception you get in cases involving navigation, and ensures that the COOP and non-COOP cases always behave identically.

Jun 08 '22 14:06 jgraham

PR #1667 will fix this issue.

Jun 16 '22 12:06 whimboo

The Browser Testing and Tools Working Group just discussed Shared element references.

The full IRC log of that discussion

<AutomatedTester> topic: Shared element references
<AutomatedTester> github: https://github.com/w3c/webdriver/issues/1594
<AutomatedTester> jgraham (IRC): the title of this issue is very misleading
<AutomatedTester> ... for webdriver when we return an element or send an element
<AutomatedTester> ... we check is the element in the DOM and attached
<AutomatedTester> ... if not we return `StaleElementReference`
<AutomatedTester> ... when working on trying to make sure that the cache of elements is shared between classic and bidi
<AutomatedTester> ... for bidi we have decided that we don't want to add this
<AutomatedTester> ... [explains example of across windows]
<AutomatedTester> ... [explains that we shouldn't be allowed to share things across origins]
<AutomatedTester> ... the question is how do we explain this in the spec
<AutomatedTester> automatedtester: what does this mean in 'accessible'?
<AutomatedTester> jgraham (IRC): this means browser context group I think
<AutomatedTester> ... the questions are Do we disagree with assumptions in there?
<AutomatedTester> ... and "how do this relate to PR 180 in the bidi spec"
<AutomatedTester> q?
<AutomatedTester> q+
<jgraham> https://github.com/w3c/webdriver-bidi/pull/180
<jgraham> AutomatedTester: If we remove StaleElementReference, and make it per browsing context group, what is the impact on compat?
<jgraham> ... this could impact Selenium users.
<AutomatedTester> jgraham (IRC): I forgot to explain, this wouldn't have any normative changes. WebDriver would still work as it does today
<AutomatedTester> q?
<AutomatedTester> ack next
<JimEvans> q+
<jgraham> But WebDriver-BiDi would allow a superset of the WebDriver behaviour
<AutomatedTester> ack next
<AutomatedTester> Jim Evans: as I am thinking about it and I will talk mostly from Selenium use case
<AutomatedTester> ... if we look at a SPA where doing an action on the page that causes a refresh
<AutomatedTester> ... people are using that stale element as a sychronism point
<AutomatedTester> ... in the classic we aren't changing things but how would we create this type of test in the bidi world
<AutomatedTester> jgraham (IRC): the first thing classic doesnt' change
<AutomatedTester> ... at a high level we should be encouraging people to look for specific events that they can use
<AutomatedTester> ... instead of using polling
<AutomatedTester> ... if people pass in a element id and it's been GC'ed then it wouldn't be able to deserialise and error
<AutomatedTester> ... and since we want to implement classic on top of bidi we need to make sure it works
<AutomatedTester> Sam Sneddon [:gsnedders]: so would the connected check be an extra bidi call?
<AutomatedTester> jgraham (IRC): yes as things are now
<AutomatedTester> Sam Sneddon [:gsnedders]: so it looks like we need to implement it but it's implementable
<AutomatedTester> ... [discusses a possible solution with tree walking]
<AutomatedTester> jgraham (IRC): I'm not concerned about our ability to support these use cases
<AutomatedTester> ... the question is what we want bidi to do and the state of where sadym (IRC) is at
<AutomatedTester> Jim Evans: This does answer my question. I wonder if there is mileage in having an event in sent when an element is removed from the DOM
<AutomatedTester> jgraham (IRC): in the spec there is seriealization of a node
<sadym> q+
<AutomatedTester> ... in terms of getting an event when a node is removed
<AutomatedTester> ... I would expect people to install a script to do mutation observers
<AutomatedTester> ... it might not be a primative but a combination of features
<AutomatedTester> q?
<AutomatedTester> ack next
<AutomatedTester> sadym (IRC): so I when I wrote the spec for shared ID
<AutomatedTester> ... I wrote it that it didn't care if the element was in the DOM but where it is now
<AutomatedTester> jgraham (IRC): my understanding of shared id is that as long it hasn't been GC'ed it will be able to send the element back
<AutomatedTester> ... so you can see if the element is there by trying to deserialise and can't it's not then we can error
<AutomatedTester> ... or if we return something we can see if the parent is a document to see if it is in the document
<AutomatedTester> q?
<gsnedders> https://developer.mozilla.org/en-US/docs/Web/API/Node/isConnected
<jgraham> AutomatedTester: A question I have is to get to the same state as StaleReference might require 3 wire calls. That's a lot of round trips for cloud providers. Is that correct?
<sadym> q+
<AutomatedTester> jgraham (IRC): if you are implement element selected? In this case you could do this as execute
<AutomatedTester> ... but if it's returned from a different method classic would fail quickly but bidi would need to do an extra call
<AutomatedTester> ... but in the common case it should hopefully be equal between the two modes
<AutomatedTester> ack next
<AutomatedTester> sadym (IRC): what is the scenario when the user wants to get the stale state in bidi
<AutomatedTester> ... to me it seems exotic
<gsnedders> q+ to mention why we have the connectedness test originally
<AutomatedTester> ... there is a web api
<AutomatedTester> jgraham (IRC): [repeats Jim Evans example and explains]
<AutomatedTester> ... there is a DOM api that people can use to check the connectedness
<AutomatedTester> ... the observable difference is in classic, if you find an element refresh it would error
<AutomatedTester> ... in bidi it wouldn't error without doing a number of checks
<AutomatedTester> ... we could not implement this info but it makes classic on top of bidi slightly more complex
<JimEvans> q+
<AutomatedTester> ... and it could be racy
<AutomatedTester> ack next
<Zakim> gsnedders, you wanted to mention why we have the connectedness test originally
<AutomatedTester> Sam Sneddon [:gsnedders]: why do we have the connectedness tests?
<AutomatedTester> ... provided we aren't concerned with implementing this without weakmaps then we don't need to worry
<AutomatedTester> ... [explains example web component in a SPA]
<AutomatedTester> ... can we change the semantics of stale reference/ I expect no
<AutomatedTester> automatedtester: no we cant
<AutomatedTester> q?
<AutomatedTester> ack next
<jgraham> gsnedders' use case of a SPA implementing a component cache is useful.
<AutomatedTester> Jim Evans: for clarification I will reiterate
<AutomatedTester> ... there are a number of APIs for element state and element interaction that will require an extra round trip
<AutomatedTester> jgraham (IRC): not necessaryly
<AutomatedTester> Jim Evans: in the case of element rect example, if people do that and it's not attached they will get a stale reference and then people do that extra test
<gsnedders> q?
<AutomatedTester> ... specifically for webdriver classic on bidi
<AutomatedTester> Sam Sneddon [:gsnedders]: if we need to walk the tree then in bidi it would be very expensive
<AutomatedTester> jgraham (IRC): in the script that we can just add 1 more line for connectedness
<AutomatedTester> ... but in the executescript case it will be a lot more that needs to be done
<sadym> q+
<AutomatedTester> ... [discussion around executeScript and how this could impact networks etc]
<jgraham> Question is "can we ever end up running user code during JSON serialization in classic that would allow us to observe teh difference between bailing early with a stale element reference and doing the full serialization and then later checking for stale elements?"
<AutomatedTester> q?
<AutomatedTester> ack next
<AutomatedTester> sadym (IRC): you can get a stale reference when you try access that element is that correct?
<gsnedders> jgraham: Oh, no, we do serialise accessor properties, not just data properties, so "yes".
<AutomatedTester> jgraham (IRC): I think it might also error when doing a serialisation
<AutomatedTester> sadym (IRC): so chromedriver keeps a map of the guid and tries to access it on cdp and if its not there it throws stale element
<AutomatedTester> jgraham (IRC): if chromedriver isn't following the spec exactly here we might be able to change the sematics slightly here and get away with it
<AutomatedTester> q?
<jgraham> https://w3c.github.io/webdriver/#dfn-json-clone is the serialization algorithm from classic and it does check if the element is stale
<AutomatedTester> jgraham (IRC): sadym (IRC) do you have a status update on PR 180
<jgraham> https://github.com/w3c/webdriver-bidi/pull/180
<AutomatedTester> sadym (IRC): I have an action to look at how chromedriver should would
<AutomatedTester> s/would/work
<AutomatedTester> ... I have changed priorities since then so would need to check on it
<AutomatedTester> jgraham (IRC): do you have any outstanding items on that pr?
<AutomatedTester> sadym (IRC): I can't remember the state of this
<AutomatedTester> ... I think there are actionable items there that I need to work on

Sep 15 '22 22:09 css-meeting-bot

@jgraham I assume that we can now close this issue based on your work on pull #1705?

Feb 09 '23 07:02 whimboo

webdriver webdriver copied to clipboard

Make `list of known elements` per Window, not per browsing context

webdriver
webdriver copied to clipboard