webdriver Input events dispatch to top-level frame

As discussed in https://github.com/w3c/webdriver-bidi/issues/795, the actions should be dispatched from the top-level browsing context.

Preview | Diff

Oct 22 '24 15:10 sadym-chromium

@whimboo WDYT?

Oct 23 '24 09:10 sadym-chromium

The following two things seem to be still missing:

steps to convert the coordinates to the top-level browsing context coordinate space if they are frame scoped. Edit: unless this is covered by the implementation-defined bits?
apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Oct 23 '24 11:10 OrKoN

The following two things seem to be still missing: 1. steps to convert the coordinates to the top-level browsing context coordinate space if they are frame scoped. Edit: unless this is covered by the implementation-defined bits?

Yes, absolutely. By changing the way where we dispatch the events the coordinates should be exactly the same. Not taking care of offsets would cause quite a lot of regressions for those users who make use of actions a lot.

Therefore see:

https://github.com/web-platform-tests/wpt/pull/48147 https://github.com/web-platform-tests/wpt/pull/48123

Given that Chrome seems to already dispatch actions in the top-level browsing context the referenced tests are failing.

2. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Yes, we would have to do it for all the pointer and wheel input sources.

Keyboard actions are more challenging because they could allow users to trigger shortcuts that access restricted browser features. This would require additional checks to determine what actions are permitted and which ones should be blocked. Maybe we should have a follow-up issue for it?

Oct 23 '24 12:10 whimboo

apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Aren't all the actions end up in perform implementation-specific action dispatch steps?

Oct 23 '24 13:10 sadym-chromium

@OrKoN @whimboo I added calculation of the offset relative to the parent browsing context, but IDK how well it will work

Oct 23 '24 13:10 sadym-chromium

@OrKoN @whimboo I added calculation of the offset relative to the parent browsing context, but IDK how well it will work

I think that we should have more wdspec tests (similar to mine for mouse) so we cover at least each input source (including sub sources) and verify implementations against. Did you get Chrome working for my example?

Oct 23 '24 13:10 whimboo

@sadym-chromium it looks like each action type dispatches on the context parameter https://w3c.github.io/webdriver/#dfn-dispatch-a-keydown-action

apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Aren't all the actions end up in perform implementation-specific action dispatch steps?

that's right, overlooked that.

Oct 23 '24 13:10 OrKoN

Keyboard actions are more challenging because they could allow users to trigger shortcuts that access restricted browser features. This would require additional checks to determine what actions are permitted and which ones should be blocked. Maybe we should have a follow-up issue for it?

so actually I am not sure, even with this change it is not quite defined what it means that the actions are dispatched to the top-level browsing context. For example, in Chrome that is the case but still would not give you access to browser shortcuts. Perhaps we need a better definition of what the dispatch of an event means, perhaps hook into native event handles in https://www.w3.org/TR/uievents/#handle-native-mouse-move-id?

Oct 23 '24 13:10 OrKoN

so actually I am not sure, even with this change it is not quite defined what it means that the actions are dispatched to the top-level browsing context. For example, in Chrome that is the case but still would not give you access to browser shortcuts.

Yes you are right. I mixed it up with the native event dispatching that I'm working on as well right now. In those cases we would have that particular issue but not when dispatching it in the content process of the top-level browsing context.

Oct 23 '24 14:10 whimboo

We should as well clarify what should happen if a frame gets closed by an action. Following actions in the same chain should still be dispatched even through they will reach some other document and elements? See the following wpt test as example: https://github.com/web-platform-tests/wpt/blob/master/input-events/input-events-spin-button-click-on-number-input-delete-document.html

Dec 11 '24 16:12 whimboo

The Browser Testing and Tools Working Group just discussed Input events dispatch to top-level frame.

The full IRC log of that discussion

<AutomatedTester> Topic: Input events dispatch to top-level frame
<AutomatedTester> github: https://github.com/w3c/webdriver/pull/1847
<jgraham> q+
<AutomatedTester> sadym: There is already a discussion that is in the PR. I am a bit stuck with which approach we should be doing here
<AutomatedTester> ack next
<AutomatedTester> jgraham: At the moment the spec say we pick and iframe and send events to/from that
<AutomatedTester> ... but we may want other data from all frames. e.g. if an overlay is over an iframe and click. We want to have the envet fro the overlay and then the Iframe
<AutomatedTester> ... and there might also be a case with what happens when the iframe disappears
<AutomatedTester> ... we should handle cancelling when frame goes and stop the propagation from the frame but items can still go that way
<AutomatedTester> ... in the future we should probably have a way doing calculations based off the iframe
<AutomatedTester> sadym: my first question: do we want to specify the calc the coords or dispatch to
<jgraham> q+
<AutomatedTester> ... and do that on the <missed what was said>
<orkon> q+
<AutomatedTester> ... and then do the calculations more precise and then do htat to the top level
<AutomatedTester> ack jgraham
<AutomatedTester> jgraham: yes... we need to work with how browsers actually work and then do that from the top/parent and let that go down to the correct place
<AutomatedTester> ... I feel like we agree on the model here
<AutomatedTester> ... the main issue is what happens when the iframe disappears
<AutomatedTester> ... we can either keep going or can fail
<whimboo> q+
<AutomatedTester> q+
<AutomatedTester> ack next
<AutomatedTester> orkon: I think the issue if the iframe disappears. I thought that was solved with the PR from whimboo .
<AutomatedTester> ... I think that if the the iframe disappears we should still continue sending the actions
<AutomatedTester> ... e.g. mouse down removes the iframe we should continue
<jgraham> q+
<AutomatedTester> ... back to sadym if we change to to the top level then the calculations could be a lot harder to do where the current way is already working
<AutomatedTester> ack next
<AutomatedTester> whimboo: a follow to the PR, I haven't done this
<orkon> PR I meant https://github.com/w3c/webdriver/pull/1861
<AutomatedTester> ... I wanted to give a comment to jgraham if we have to continue then it might be good to handle both case (carry on and error)
<AutomatedTester> ... and we would have a default that is managed by an argument
<AutomatedTester> ack next
<tidoust> AutomatedTester: Initially, actions were "do as I say", not "do what I mean". actions.mousedown would assume that the element would be in the viewport. Little things like that. Actions should be above the glass. You would just be telling the coordinates and do the action. But you don't necessarily know what's underneath. If I do element.click,
<tidoust> behavior is different.
<tidoust> i/AutomatedTester:/scribe+ tidoust
<tidoust> AutomatedTester: For iframes, behavior has indeed always be different.
<AutomatedTester> ack next
<orkon> q+
<orkon> q-
<AutomatedTester> jgraham: Do we need to be precise in the spec? Yes definitely
<AutomatedTester> ... I know that there are parts that say browser specific but coordiniates is different and we all have the same on that
<AutomatedTester> ... my proposal for clients would send details from the top level traversible
<AutomatedTester> ... but clients could send them at iframe if they want but then handle the situation if it disappears
<orkon> q+
<AutomatedTester> ... I think we need to follow this up in the issues
<AutomatedTester> ack next
<AutomatedTester> orkon: I agree with the error if it doesn't still exist
<AutomatedTester> ... we could do the calculations at the beginning
<AutomatedTester> jgraham: I don't think we can beause if we have scroll then all the coords are out
<AutomatedTester> q?

Dec 11 '24 17:12 css-meeting-bot

webdriver webdriver copied to clipboard

Input events dispatch to top-level frame

webdriver
webdriver copied to clipboard