stretch Investigate supporting web-platform-tests

We should investigate if we can support https://github.com/web-platform-tests/wpt.

ping @jugglinmike who wanted to help out with this.

May 03 '19 09:05 emilsjolander

Thanks, @emilsjolander! I spent some time researching this tonight--here's what I came up with.

For background: we can ignore the vast majority of tests in WPT. For now, we're just interested in the flexbox tests. Fortunately, filtering those out is straightforward because they're all organized in the css/css-flexbox directory.

These tests use a similar approach to verifying correct behavior; WPT calls them "reftests". You can read about them here, but roughly speaking, they're run like this:

the document is parsed for one or more references to other files
the document is opened in the browser under test, and a screenshot is taken
the reference documents are opened in the browser under test, and a screenshot is taken
the screenshots are compared for similarity or dissimilarity (depending on how the references were declared)

All of this automated (typically using WebDriver).

There are two important aspects of this for a non-browser consumer like Stretch:

styling is not defined directly on the nodes. It's applied through CSS rules
the tests are designed with a specific definition of "equivalency"--that is: visual equivalence. There's no guarantee about the relationship of the layout between the subject document and the reference document(s). To use them for Stretch, we won't be able to compute the layout of the documents and compare.

An example might make the second point more clear. In align-content-001.htm, flexbox is used to position 4 flex items within a flex container to obscure red pixels. The corresponding reference document simply defines a single non-flex element with a solid green background.

To automatically consume these tests, Stretch could perform the same steps listed above, but change out the second step:

Use a browser to read the intended styling, compute the resulting layout according to Stretch, and then push the layout back into the browser. Finally, take the screenshot.

In slightly more detail:

Load the document under test in a browser
Parse its DOM for all flex content, capturing styles with getComputedStyle
Mirror that DOM in Stretch and then compute the layout
Apply the computed layout the DOM in the browser
Take the screenshot

We wouldn't need to perform this step for the reference documents because they explicitly don't use flexbox. We'd have a dependency on a browser, which seems a little strange, but really we'd just be using it to apply CSS to nodes and to render documents (neither of which are Stretch's responsibilities).

Does that sound feasible to you, @emilsjolander? It's a little involved, but I those tests are really valuable! They're also very well maintained (the Chromium project just today contributed a new test). If the solution generalizes as well as we hope, then adapting it to use the css-grid tests should be straightforward.

May 04 '19 04:05 jugglinmike

@emilsjolander One thing which might discourage that solution is Stretch's lack of alternate display values. I know that grid is on the roadmap, but do you have any plans to support inline and block?

May 18 '19 03:05 jugglinmike

@jugglinmike Correct. No plans on supporting inline or block. We don't want to re-create all of css layout but instead focus on the layout models optimal for building apps which we think are Flexbox and Grid.

May 18 '19 16:05 emilsjolander

@jugglinmike I think this would be a great contribution. Especially as you say to prepare us for grid support. I also think the plan for implementing this sounds reasonable. I'm a bit confused why we need to rely on screenshots though. Can't we just port the retests over to the current testing setup?

May 18 '19 16:05 emilsjolander

@jugglinmike I think this would be a great contribution. Especially as you say to prepare us for grid support. I also think the plan for implementing this sounds reasonable.

Good to hear!

I'm a bit confused why we need to rely on screenshots though. Can't we just port the retests over to the current testing setup?

Since the reftests are interpreted using the screen rendering, some number of the reference documents have been authored to take shortcuts. Instead of describing an equivalent layout, they describe a page with an equivalent visual appearance.

Here's an abbreviated version of the test I mentioned above:

<style type="text/css">
#flexbox
{
    background: linear-gradient(to bottom, green 0, green 25px, red 25px, red 75px, green 75px, green 100px);
    align-content: center;
    display: flex;
    flex-flow: wrap;
    height: 100px;
    width: 300px;
}
div div
{
    background-color: green;
    height: 26px;
    width: 150px;
}
</style>
<div id="flexbox">
  <div></div>
  <div></div>
  <div></div>
  <div></div>
</div>

...and here's a rough version of the corresponding reference document:

<style>
div {
  background: green;
  height: 100px;
  width: 300px;
}
</style>
<div></div>

As far as I know, we can't verify that Stretch satisfies this test without actually rendering the reference document alongside a version of the test document (transformed according to Stretch's layout measurements).

Does that sound right to you?

May 18 '19 21:05 jugglinmike

As far as I know, we can't verify that Stretch satisfies this test without actually rendering the reference document alongside a version of the test document (transformed according to Stretch's layout measurements).

Because Chrome (and other browsers) already validate that they render the test document correctly according to the reference document all we should need to do is validate that Stretch renders the test document in the same way as Chrome (or another browser). Right?

May 19 '19 10:05 emilsjolander

The website wpt.fyi visualizes test results in recent browser releases, and its page on the flexbox tests shows that no one is 100%. Even if one was, there's no telling when new tests might be introduced, so I wouldn't recommend taking correctness for granted generally.

May 20 '19 02:05 jugglinmike

@jugglinmike makes sense. Then I think your plan for testing sounds good. Is this something you plan on working on? How can I help?

May 23 '19 23:05 emilsjolander

I think I can help out with this, though I might be a little slow.

The first thing I wanted to do was verify that Stretch's limitation on display modes wouldn't interfere with testing.

Specifically, I was concerned about tests which included flex containers as distant children of other flex containers, e.g.

<div style="display: flex;">
  <div>
    <div>
      <div style="display: flex;">
        <div></div>
      </div>
    </div>
  </div>
</div>

Stretch won't be able to recreate layouts like these, so the tests would have to be designated as "expected failures." If many tests had to be skipped, then that would limit the value of integrating with WPT and also represent a new maintenance burden (since we'd have to perform this classification each time we pulled in new tests).

Fortunately, as of WPT commit 14a5a02, only two tests exhibit that pattern:

So this limitation shouldn't be a concern.

Jun 01 '19 06:06 jugglinmike