results-collection
results-collection copied to clipboard
# Testing Firefox in "headless" mode
In its default configuration, Firefox requires a display in order to run. This project currently executes Firefox using xvfb
, a utility that creates virtual displays in the X windowing system. Recently, we have been struggling with a regression that appears to be related to the communication between Firefox and the virtual display (see gh-592).
As of version 55, Firefox implements a "headless" mode which allows the browser to be run in the absence of a display. In gh-592, @whimboo recommended enabling that feature instead of configuring Firefox to use a virtual display. In addition to side-stepping the regression, they mentioned that it would be faster overall.
There are implementation details which the feature's documentation does not describe but which may adversely influence conformance test results:
- the effective resolution (critical for "reftests," which use WebDriver's "screenshot" command)
- the effect on display-oriented web platform features (I'm thinking particularly of Fullscreen, but there may be more... Maybe CSS units like
vh
andvh
)
More generally, I wonder about authenticity. Which is more true-to-life: Firefox with a virtual frame buffer or Firefox in headless mode?
@whimboo are you familiar with the feature? Can you speak to any of these concerns? Or do you know who I could contact to learn more?
Using Xvfb is more like a real user experience, certainly. Xvfb uses the normal Firefox code but renders to a framebuffer that is never drawn to the screen. Headless mode more or less stubs out the rendering code.
Headless mode may be faster since it does less. It could be more reliable if the X server does something asynchronously that is synchronous in the headless implementation.
At some point headless mode on Linux passed all web-platform tests that Linux+xvfb did on our CI (excluding disabled tests), without needing special metadata just for headless. But I don't think that configuration is actually run, so it's hard to know whether that level of compatibility is being maintained.
Overall I would hesitate to move to using headless by default for a case where we are so interested in the precise behaviour of the browser
Thanks, James!
Like @jgraham, I'd be hesitate to move away from the codepaths that actually get run by users. Certainly headless is a useful performance optimisation for many things, but it isn't testing what users actually see, and I think that (more than performance) is what we should be focusing on. (Performance, after all, can be solved in other ways, like more parallelism.)
@jugglinmike have you looked at running Chrome headless?
I think there is an argument for exercising the headless codepaths, namely that this is what web developers setting up cross-browser testing are likely to do if they know how. It's certainly a risk that there are differences between headless and headful, but perhaps we could do a weekly or monthly run of headful to compare against, rather than always spending more cycles?
Aside: It would be great if ./wpt run
were headless by default, since it's not really possible to use your machine will windows are flying around.
A datapoint is that we have not had many deviations in results between Xvfb and headless mode in the Firefox CI, as https://searchfox.org/mozilla-central/search?q=headless&case=false®exp=false&path=testing%2Fmarionette%2Fharness%2F**%2F*.py shows.
To the contrary, running Firefox in headless mode provides more stable results because Marionette does not have to interact with unpredictable window managers and the inherent asynchronous nature of the X11 protocol. Headless will also be faster because anything related to painting and window manipulation is stubbed out.
Headless mode is controlled through the MOZ_HEADLESS
environment variable. MOZ_HEADLESS_WIDTH
and MOZ_HEADLESS_HEIGHT
can be used to control the default window size. Subsequent window resizing using WebDriver/Marionette will update this state.
There is arguably a stronger argument for using headless in the WPT CI than there is for wpt.fyi result collection, where you might value speed over correctness. I would be strongly opposed to use headless in the Firefox CI, where emulating the user’s environment by interacting with the WM, loading libgtk, &c. is paramount to ensure Firefox is not only theoretically usable.
I think the open question is what difference in actual test results we can expect from running in headless mode? If there are no differences, isn’t the Xvfb approach only theoretically better in this context?
I think that we should make --headless
an option to wptrunner that abstracts the details of enbling headless mode across different browsers. It could make sense to enable that for the stability checks, but unless we have some reasonable assurance that headless mode matches non-headless in all cases (e.g. the relevant browser running both configurations in CI with identical results) I don't think we should enable it for wpt.fyi-ingested runs.
For local runs I can see the advantage of implying --headless
by default as long as the output clearly informs people what's happening (we would of course have --no-headless
to turn it off).
I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1434382 about a wptrunner --headless
flag some time ago, but maybe a GitHub issue would’ve been better.
@andreastt the searchfox.org link you shared doesn't seem to list test statuses, should I be able to see a list of tests that do differ between headless and Xvfb there?
I've filed https://github.com/web-platform-tests/wpt/issues/13005 for the CLI feature request.
@foolip For the functional Marionette tests, all tests are expected to pass unless marked to skip, so we only have one test for the remote protocol that we can’t get working.
Looking at WPT results I’m sure is a more relevant datapoint, but I wanted to point out that we don’t have many known-headless-issues in our WebDriver implementation.
I think there is an argument for exercising the headless codepaths, namely that this is what web developers setting up cross-browser testing are likely to do if they know how. It's certainly a risk that there are differences between headless and headful, but perhaps we could do a weekly or monthly run of headful to compare against, rather than always spending more cycles?
@foolip I haven't attempted to use Chrome's headless feature. There hasn't been a need to change the approach that was initially implemented, and I've had my doubts about authenticity. I'm comfortable if this makes our results less representative of a web developer's testing experience if it means the results are more representative of a user's browsing experience.
To the extent that these things don't align, we're sure to surprise some people no matter which "side" we choose. Targeting the user is what the web developer is doing, anyway. If their process is flawed, then that's clearly a problem, but it may be something we can help fix. I just think we should consider the "headless" mode the special case.
Hoping to get some clarity about the actual effect on test results, I enabled Chrome's headless mode on Bocoup's fork and triggered a TaskCluster build. The experiment was fairly catastrophic: over 2,000 testharness.js tests timed out, and 2 of the 26 TaskCluster tasks never completed. It's safe to say that I'm missing something. If anyone reading has any tips on how the feature should be enabled for WPT, then I'd be happy to try again!
Oh, wow, that's not quite ready to land then :)
I'm afraid I don't have any tips, if we wanted to pursue this we'd have to dig into the failure categories one by one.
Just out of curiosity, I would be interested to know how Firefox fares in that experiment. Setting MOZ_HEADLESS=1
in the environment should be sufficient.
I'll give it a shot, @andreastt . Is setting MOZ_HEADLESS=1
different than using the -headless
flag?
I'll give it a shot, @andreastt . Is setting MOZ_HEADLESS=1 different than using the -headless flag?
They are equivalent.
@andreastt I wanted to verify this locally before experimenting with all of WPT
on TaskCluster. I hit a wall right out of the gate, though: things stall out
immediately after starting the browser. Here's the last thing that the WPT CLI
outputs following ./wpt run --binary-arg=-headless firefox dom/events/CustomEvent.html
:
7:33.05 INFO Application command: /home/mike/projects/bocoup/google-wpt/web-platform-tests/_venv/browsers/firefox/firefox -headless -profile /tmp/tmpWt0563.mozrunner
7:33.07 INFO Starting runner
7:33.10 pid:11492 Full command: /home/mike/projects/bocoup/google-wpt/web-platform-tests/_venv/browsers/firefox/firefox -headless -profile /tmp/tmpWt0563.mozrunner
pid:11492 *** You are running in headless mode.
After a few minutes, it exits with:
8:42.82 INFO Browser exited with return code -15
8:52.83 WARNING Forcibly terminating runner process
8:52.86 WARNING u'stop': ()
8:52.86 ERROR Max restarts exceeded
8:52.86 INFO Got 0 unexpected results
8:52.86 SUITE_END
web-platform-test
~~~~~~~~~~~~~~~~~
Ran 0 checks ()
Expected results: 0
OK
8:52.95 INFO Closing logging queue
8:52.95 INFO queue closed
8:52.95 ERROR No tests ran
Are you set up to try this on your end? If not, can I get you more debugging information?
@jugglinmike do you have a link to the TaskCluster / Buildbot log?
Nope, that was a result of running WPT locally. I can generate logs for you, though. Full command:
./wpt run --binary-arg=-headless --log-mach=mach.txt --log-mach-level=debug firefox dom/events/CustomEvent.html
Output:
https://gist.github.com/jugglinmike/151e08f22d7b4427e0bd670294d37d49
Can you include --webdriver-arg=-vv
in that?
Sure, though the command
./wpt run --binary-arg=-headless --webdriver-arg=-vv --log-mach=log-mach-2.txt --log-mach-level=debug firefox dom/events/CustomEvent.html
Produced a log file that has no more information than the previous file:
https://gist.github.com/jugglinmike/7cc77c951cb50bedeb374d60fb6f936a
Should I be using --log-tbpl
? Or some additional --webdriver-arg
?
--webdriver-arg
would only work for those wpt tests which use geckodriver, which is tests/webdriver
only. Running any other test which is using purely Marionette won't work.
But in such a case you should tweak the test runner and set the preference marionette.log.level
to trace
.
Thanks, @whimboo! I think we're getting somewhere. Full command:
./wpt run --binary-arg=-headless --log-mach=mach.txt --log-mach-level=debug --setpref marionette.log.level=trace firefox dom/events/CustomEvent.html
Output:
https://gist.github.com/jugglinmike/97f57b6aa7c3e5d3d92ffd5bc8cd612a
In particular:
[2m[34m 0:41.85(B[m [34mINFO(B[m Starting runner [2m[34m 0:41.85(B[m DEBUG Test runner started [2m[34m 0:41.88(B[m [2m[36mpid:10575(B[m Full command: /home/mike/projects/bocoup/google-wpt/web-platform-tests/_venv/browsers/firefox/firefox -headless -profile /tmp/tmpGLs4P9.mozrunner [2m[36mpid:10575(B[m *** You are running in headless mode. [2m[34m 0:41.92(B[m DEBUG Got command: u'log' [2m[34m 0:41.93(B[m DEBUG Executor setup [2m[34m 0:41.93(B[m DEBUG Got command: u'log' [2m[34m 0:41.93(B[m DEBUG Connecting to Marionette on port 2828 [2m[34m 0:41.93(B[m DEBUG Got command: u'log' [2m[34m 0:41.93(B[m DEBUG Waiting for Marionette connection [2m[34m 0:42.70(B[m [2m[36mpid:10575(B[m 1537466880599 Marionette DEBUG Received observer notification profile-after-change [2m[34m 0:42.84(B[m [2m[36mpid:10575(B[m 1537466880733 Marionette DEBUG Received observer notification command-line-startup [2m[34m 0:43.05(B[m [2m[36mpid:10575(B[m 1537466880940 Marionette DEBUG Received observer notification toplevel-window-ready [2m[34m 0:44.01(B[m [2m[36mpid:10575(B[m 1537466881909 Marionette DEBUG Received observer notification sessionstore-windows-restored [2m[34m 0:44.01(B[m [2m[36mpid:10575(B[m 1537466881910 Marionette DEBUG Init aborted (running=false, enabled=false, finalUIStartup=true) [2m[34m 1:51.55(B[m DEBUG init_failed called from timer [2m[34m 1:51.55(B[m DEBUG Got command: u'init_failed' [2m[34m 1:51.60(B[m [34mINFO(B[m Browser exited with return code -15 [2m[34m 1:51.60(B[m DEBUG PROCESS LEAKS None [2m[34m 1:51.60(B[m DEBUG stopped [2m[34m 1:51.60(B[m DEBUG ensure_runner_stopped [2m[34m 1:51.60(B[m DEBUG waiting for runner process to end
Hm, please note that the initialization of Marionette got aborted because the component is not enabled! I also don't see that the -marionette
argument is passed to the Firefox binary, nor that the marionette.enabled
preference is set to true
anywhere.
Depending on what version of Firefox you’re using, you need to set the preference in the correct case. Since recently, Nightly only supports Trace
(capital T).
The log does say that it’s connecting to Marionette, and Marionette does appear to be enabled. The -marionette
flag is not the only way to enable Marionette: you can also set the marionette.enabled
preference to true, which is what I suspect wptrunner is doing.
If you could try --setpref marionette.log.level=Trace
I would be much obliged.
@andreastt, nope. The case is irrelevant to Marionette so far. I exactly requested to revert that for your patch on https://bugzilla.mozilla.org/show_bug.cgi?id=1482829 before it landed.
So while the internal Marionette component is enabled, the internal enabled
state is false. The wptrunner shouldn't do that, but the Marionette client in geckoinstance.py
. I wonder if there is a bug.
If you could try
--setpref marionette.log.level=Trace
I would be much obliged.
I think @whimboo may be on to something, since making that change had no discernable effect on the log.
I also don't see that the -marionette argument is passed to the Firefox binary, nor that the marionette.enabled preference is set to true anywhere.
The following commands also hang:
./wpt run --binary-arg=-headless --setpref marionette.enabled=true --no-pause firefox dom/events/CustomEvent.html
./wpt run --binary-arg=-headless --setpref marionette.enabled=True --no-pause firefox dom/events/CustomEvent.html
However, explicitly specifying -marionette
works:
./wpt run --binary-arg=-headless --binary-arg=-marionette --no-pause firefox dom/events/CustomEvent.html
Which is strange. For kicks, I tried specifying a bogus argument instead of -headless
:
./wpt run --binary-arg=-bogus --no-pause firefox dom/events/CustomEvent.html
That also hangs, giving a pretty good indication of the problem here. The WPT CLI currently replaces the -marionette
argument with any user-specified --binary-arg
, but it ought to use -marionette
in all cases. I've submitted a fix for that here: https://github.com/web-platform-tests/wpt/pull/13154.
In the mean time, I've triggered a build on TaskCluster by explicitly specifying both arguments: https://tools.taskcluster.net/groups/F8b0wLxIRY2VkFN0Ykn9Dw
Yes, regrading the log pref casing I was misremembering events. The casing shouldn’t matter. Also the debug log shows enable=false
which does indeed mean Marionette is not getting enabled.
That also hangs, giving a pretty good indication of the problem here. The WPT CLI currently replaces the -marionette argument with any user-specified --binary-arg, but it ought to use -marionette in all cases. I've submitted a fix for that here: web-platform-tests/wpt#13154.
Thanks! But shouldn’t it be -marionette
plus whatever is given in --binary-arg
?
Yup, that wasn't to say -marionette
should be used exclusively. The patch includes a test which verifies the user-specified argument is also used.
For Firefox Nightly running against WPT commit e6fd8f35c3604d3b94d0a975a3e1d5fa8ef0cabc, we observed 184 discrepancies between "xvfb" and "headless". At the bottom of this comment, you'll find a summary which includes a "score" for each test.
- reftest pass: 1
- reftest failure: 0
- testharness "OK": (number of passing subtests) / (total number of subtests)
- testharness error: -1
- testharness timeout: -1
Bear in mind that some of these discrepancies are likely due to unrelated flakiness (more discussion on that here).
Discrepancies between Firefox run with xvfb and `-headless`
Please note that we don't run headless in our own CI yet. So we aren't aware of differences ourselves. But lately we were at least talking about enabling headless for wdspec as the first step. So this may happen soon.