wpt icon indicating copy to clipboard operation
wpt copied to clipboard

Make wpt own the HTML parser test data and remove dependency on html5lib-python, html5lib-tests

Open zcorpan opened this issue 4 years ago • 14 comments

This week I've done the exercise of updating HTML parser tests again, though this time I was a bit more successful in figuring out how to get those changes through to wpt (see #2887). But boy is it painful and also mostly undocumented!

  • Make the test data change to html5lib-tests (in the custom test data format) https://github.com/html5lib/html5lib-tests/pull/133
  • Update html5lib-python's submodule of html5lib-tests AND update .pytest.expect (manually?) so that html5lib itself doesn't fail the changed tests without having them marked as expected failures. https://github.com/html5lib/html5lib-python/pull/531
  • Update the commit hash for html5lib-python in wpt's html/tools/build.sh and generate tests in wpt by running html/tools/build.sh. https://github.com/web-platform-tests/wpt/pull/27799

Juggling 3 repos for one change like this doesn't seem ideal for contributors. From wpt's perspective, what I would like instead is:

  • Make the test data change in wpt and run a script to generate tests. No dependency on html5lib.

Then html5lib-python can get the tree-builder test data from wpt instead of from html5lib-tests.

Thoughts? @gsnedders @jgraham @annevk @stephenmcgruer

zcorpan avatar Mar 02 '21 23:03 zcorpan

this is effectively a dupe of https://github.com/html5lib/html5lib-tests/issues/127 fwiw

gsnedders avatar Mar 03 '21 17:03 gsnedders

@gsnedders oh, right, I had forgotten about that! It seems like there isn't objection. Are you still planning to work on this?

zcorpan avatar Mar 03 '21 18:03 zcorpan

@gsnedders oh, right, I had forgotten about that! It seems like there isn't objection. Are you still planning to work on this?

It is a long way down my list.

gsnedders avatar May 06 '21 22:05 gsnedders

A tweak we can make is to depend on html5lib-tests instead of html5lib-python from wpt, which would remove the second step. (I think this was @jgraham 's idea, but don't see it mentioned in GitHub.)

zcorpan avatar Feb 07 '22 14:02 zcorpan

One obvious (easy) tweak given it's using git-submodules is to explicitly store a commit hash somewhere in WPT and then during update cd html5lib-python/html5lib/tests/testdata && git fetch origin && git checkout $REV.

gsnedders avatar Feb 14 '22 16:02 gsnedders

My main concern is that I want to preserve the file format for the preferred form form making modifications to the test, since there are non-WPT consumers of those formats.

I'm not a fan of WPT having a build step that transforms the tree builder test format. FWIW, Gecko's mochitest harness stores the original .dat format in the repo and parses it when the tests are run.

hsivonen avatar May 06 '22 08:05 hsivonen

Having the sources files in the same format in wpt and parsing them with JS when running sounds ideal actually. Can that parser be migrated to wpt?

zcorpan avatar May 06 '22 12:05 zcorpan

Having worked on a parser bug in WebKit I now think this would be even more valuable than I previously thought. It looks like Chromium and WebKit both have two sets of parser tests in the tree:

  • Some html5lib-tests fork of unspecified vintage
  • web-platform-tests's import of html5lib-tests

And the former has tests the latter might not contain. I contributed further to this problem in https://github.com/WebKit/WebKit/pull/12019, but am willing to be part of the cleanup crew if we make web-platform-tests the true home of HTML parser tests.

I suspect @mfreed7 might be interested in this from the Chromium side. Copying here to gather interest.

annevk avatar Mar 28 '23 15:03 annevk

I'm definitely supportive of the effort to clean this up, and make WPT the source of truth for parser tests.

mfreed7 avatar Mar 31 '23 23:03 mfreed7

Steps taken thus far:

  • Upstreamed WebKit-specific tests: https://github.com/html5lib/html5lib-tests/commit/4f45c0211cf1d1f1af319470f77851f60f29914c
  • Working on a new import in https://github.com/web-platform-tests/wpt/pull/39305

I wonder if @zcorpan is still interested in taking this even further as I think it would definitely be preferable if we didn't have to go via html5lib-tests.

https://github.com/html5lib/html5lib-tests does have a number of actionable issues and stale PRs worth triaging. Help appreciated.

annevk avatar Apr 01 '23 06:04 annevk

Yes. See https://github.com/html5lib/html5lib-tests/issues/127#issuecomment-1490501826 and later comments.

zcorpan avatar Apr 03 '23 07:04 zcorpan

@zcorpan any progress on this?

annevk avatar Jun 16 '23 14:06 annevk

Not yet but it's on my list.

zcorpan avatar Jun 19 '23 19:06 zcorpan

Friendly bump-up.

html5lib-python seems pretty much dead. Last commit was Feb 2024. Even removal of six and other PRs are open for over a year now. It's time we look for alternatives. I wonder if we can move to html.parser like pip did(https://github.com/pypa/pip/pull/10291).

Anutrix avatar May 26 '25 13:05 Anutrix