webdriver
webdriver copied to clipboard
Returning lone surrogates fails
wd.execute_script(u"return '\\uD800'")
fails (almost?) everywhere. (Sorry Jim, I don't have IE to hand.)
Chrome:
selenium.common.exceptions.WebDriverException: Message: unknown error: bad inspector message: {"id":18,"result":{"result":{"type":"object","value":{"status":0,"value":"\ud800"}}}}
(Session info: chrome=77.0.3865.90)
Firefox:
selenium.common.exceptions.InvalidArgumentException: Message: unexpected end of hex escape at line 1 column 27
Safari doesn't throw, just returns None.
Per spec, because we use the ES JSON.[[Parse]] and JSON.[[Stringify]] functions, we should deal with lone surrogates fine. Except this appears to be a lie.
This cropped up from https://github.com/web-platform-tests/wpt/issues/17577.
Is the spec actually wrong here? It seems to me these are implementation bugs and that we need a WPT test and bugs filed upstream.
As an aside, in the Firefox case it looks like maybe serde
is failing to decode the JSON which could point to a more serious underlying issue than just geckodriver/Marionette being wrong…
@andreastt Well, @jgraham said you were very unlikely to fix it, given it's fundamentally a difference between different JSON specs (and serde
doesn't allow lone surrogates, partly because Rust's str
can't represent them).
Yeah, it's a WONTFIX issue in serde to allow lone surrogates: https://github.com/serde-rs/json/issues/495 Of course we could work around it but at the performance cost of having to preparse the string.
Also note https://github.com/tc39/ecma262/pull/1396 has changed the definitions in ES, and has shipped in all major ES engines.
That doesn't actually help because you still end up with an escaped lone surrogate in the JSON. So libs like serde that deserialize to types that enforce string validity are still going to have a problem.
We could define the problem away at the protocol level by stating that you serialize with a replacer function that replaces lone surrogates with some escaped representation like U+D800
. There is of course a degeneracy here in that people can write return "U+D800"
, but I'm not sure to what extent that's a problem.
Given https://github.com/web-platform-tests/wpt/issues/17577 has come up again, it would be nice to have some agreement about what behaviour should be. As far as I can tell, we have a few options here:
- Do nothing, and have implementations not match the letter of the spec,
- Define behaviour for when 'Let response’s body be the UTF-8 encoded JSON serialization of a JSON Object with a key "value" set to data' fails (because, e.g., the JSON serialize function doesn't support lone surrogates), making fewer details about the JSON serialize function defined,
- Wait indefinitely for all implementations change (which they're highly unlikely to).
When it comes to the second option, the tests I wrote before (https://github.com/web-platform-tests/wpt/pull/19694) show different error behaviours: ChromeDriver giving unknown error (500), GeckoDriver giving invalid argument (400), and SafariDriver returning successfully but returning null. The SafariDriver behaviour is almost certainly the one we don't want here, but we should probably agree on what error gets returned.
At the protocol level, we could make is so that the JSON matches the output generated by ECMAScript‘s JSON.stringify
, which was not-so-recently changed to always produce well-formed output, even in the case of lone surrogates.
That doesn't actually help because you still end up with an escaped lone surrogate in the JSON. So libs like serde that deserialize to types that enforce string validity are still going to have a problem.
@jgraham While this is true, with the above change that becomes a client problem as opposed to an issue at the spec level. Clients that somehow cannot represent lone surrogates should find a workaround, but we shouldn't work around this in the spec IMHO (other than enforcing well-formed JSON).
I don't know if it's clear that output containing lone surrogates, even encoded, is actually well-formed JSON. In any case pushing the problem onto clients is arguably a violation of the priority of constituencies; in practice the tooling that handles this correctly will be the tooling that happens to use a JSON library where it's possible to represent lone surrogates and the tooling that gets this wrong will be the tooling built using a language or libraries where lone surrogates can't be represented. Since in general users may be using a mix of tooling and they don't really have much choice in the matter, we're ultimately leaving a footgun in the spec where users will experience breakage in rare scenarios, and fixing the breakage will be hard.
I don't know if it's clear that output containing lone surrogates, even encoded, is actually well-formed JSON.
AIUI, ECMA-404 allows it (but doesn't have any processing requirements, merely describes the format). STD-90 allows it, but allows parsers to disallow them.
FWIW, my view on this is that it's difficult for both GeckoDriver and SafariDriver to change here and the only place where either vendor has ever had problem reported with it is WPT, hence it's almost certainly not worth requiring lone surrogates to be handled.
https://wpt.fyi/results/webdriver/tests/execute_script/execute.py?label=pr_head&max-count=1&pr=19694 shows the status quo, where we have no interop.
So no one fixed it I guess.