scour
scour copied to clipboard
CI tests on real world documents
I think the project at this stage, and looking forward, would profit from adding a larger set of real world SVG documents in CI - in addition to the existing, large set of unit tests (which is very good that we have it).
@oberstet Are we looking for a runtime performance baseline, memory usage baseline, scour'ed output size baseline, a "scour does not crash" baseline or ...?
The choice of documents (and possibly the test runner) depends on what we aim to solve with testing on real world documents.
I was mainly thinking about: no regressions, same (non-optimized) input produces exactly (to the byte) the same (optimized) output. and if not, because we expanded scour in some way with a necessary change in output, then the respective changing PR needs to commit new versions of expected output documents as well.
and yeah, the 2nd thing: output size. because that's the whole point of Scour, right? ;) so actually, that could make sense:
the ultimate project CI benchmark for a PR is:
- output is functionally correct (it is valid SVG and parses and renders correct in browsers etc)
- output size is smaller than input and previous version of Scour output
if we could automate both of above, then we probably wouldn't need to have the optimized versions of the test set documents checked into git as target reference. because if the output changes but still satisfies above conditions, that is totally fine then.
obviously we can check condition 2. automatically. but condition 1.?
how does one check that given some SVG test document that contains a blue hexagon isn't optimized by Scour to a smaller SVG that renders a red triangle?
do we expect Scour to retain pixel exact identity between the rendering of the original SVG and the optimized SVG?
if so it might be possible to automated ..
Tobias Oberstein:
and yeah, the 2nd thing: output size. because that's the whole point of Scour, right? ;) so actually, that could make sense:
the ultimate project CI benchmark for a PR is:
- output is functionally correct (it is valid SVG and parses and renders correct in browsers etc)
Imagemagick as a compare
tool that allegedly can help us with 1.
Something like:
$ compare -metric AE -fuzz 1% orig.png optimized.png diff.png
Obviously, it will not test all browsers/renders nor animations, but it should catch visible artefacts in simple images (over the "fuzz"-threshold - see imagemagick-cmdline)
- output size is smaller than input and previous version of Scour output
Ok. The scour tool has a bunch of options that control optimized size at the price of visual artefacts. Which options should we use for this test? Perhaps something like the following as the default?
--enable-id-stripping
--enable-comment-stripping
--shorten-ids
--indent=none
--no-line-breaks
--strip-xml-prolog
--remove-descriptive-elements
--set-precision=8 [--set-c-precision=8]
--disable-embed-rasters
--renderer-workaround
[...]> do we expect Scour to retain pixel exact identity between the rendering of the original SVG and the optimized SVG?
Depends on the options used (e.g. the --set-precision is exactly
changing pixel identities). However, with compare
we can define an
"acceptable level of error" via the -fuzz
parameter and then choose
the options so conservative that we stay inside that level of error.
Thanks, ~Niels
rgd pixel correctness: ah, right, that is a user knob.
ideally then, the CI would test at different precision vs size tradeoff levels
and the CI would also test using imagemagick, as well as at least one real world browser rendering engine
maybe render using the browser engine, and compare using imagemagick?
all of this is to bring our CI testing as close as possible to what the user of Scour is really interested in
fwiw, I think in the case of Scour, benchmarking the resource consumption for the tool (CPU, RAM) is a secondary priority, because it will only matter if you wanna scour 1000s of files .. and for this scenario, we should just - wait until it happens;) and then maybe just implement parallelization, and done. scour isn't nginx.