warc2zim icon indicating copy to clipboard operation
warc2zim copied to clipboard

Investigate wabac.js 2.17.0 changes

Open benoit74 opened this issue 2 years ago • 1 comments

There is a new wabac.js 2.17.0 release (changelog here)

No change in the file we need to keep in sync (src/rewrite/dsruleset.js) but interesting new test cases around HTML/JS (body onload, background).

To be investigated further

benoit74 avatar Feb 23 '24 07:02 benoit74

There is now a 2.17.1, maybe interesting as well

benoit74 avatar Feb 26 '24 07:02 benoit74

@benoit74 Is this ticket still valid, seems important to me we stick to latest version of wabac. Maybe there is another issue to handle this problematic?

kelson42 avatar May 25 '24 05:05 kelson42

Yes it is ; it is just a complex thing since code has to be analyzed manually ; as mentioned, no impact expected on rules sets, more interesting test cases to check if we already cover them as well or not.

benoit74 avatar May 25 '24 06:05 benoit74

So there was only two interesting cases in the test set of 2.17.1:

  • onload property of a <body> in HTML => already handled
  • background property of a <td> in HTML => I don't know why one wants to support this, at least it does not look like a standard at all, not even a deprecated attribute, see e.g. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td

Other than that, the process to follow-up on fuzzy-rules and DS rules which we reimplemented from wabac.js is in place, with a reminder in the release issue and the details in respective files that have to be updated.

image

https://github.com/openzim/warc2zim/blob/main/rules/rules.yaml

https://github.com/openzim/warc2zim/blob/main/src/warc2zim/content_rewriting/ds.py

Nota: DS rewriting will probably get dropped with https://github.com/openzim/warc2zim/issues/328 anyway

benoit74 avatar Jun 27 '24 11:06 benoit74