scour icon indicating copy to clipboard operation
scour copied to clipboard

CI tests on real world documents

Open oberstet opened this issue 6 years ago • 5 comments

I think the project at this stage, and looking forward, would profit from adding a larger set of real world SVG documents in CI - in addition to the existing, large set of unit tests (which is very good that we have it).

oberstet avatar Apr 20 '18 10:04 oberstet

@oberstet Are we looking for a runtime performance baseline, memory usage baseline, scour'ed output size baseline, a "scour does not crash" baseline or ...?

The choice of documents (and possibly the test runner) depends on what we aim to solve with testing on real world documents.

nthykier avatar Apr 21 '18 07:04 nthykier

I was mainly thinking about: no regressions, same (non-optimized) input produces exactly (to the byte) the same (optimized) output. and if not, because we expanded scour in some way with a necessary change in output, then the respective changing PR needs to commit new versions of expected output documents as well.

oberstet avatar Apr 21 '18 09:04 oberstet

and yeah, the 2nd thing: output size. because that's the whole point of Scour, right? ;) so actually, that could make sense:

the ultimate project CI benchmark for a PR is:

  1. output is functionally correct (it is valid SVG and parses and renders correct in browsers etc)
  2. output size is smaller than input and previous version of Scour output

if we could automate both of above, then we probably wouldn't need to have the optimized versions of the test set documents checked into git as target reference. because if the output changes but still satisfies above conditions, that is totally fine then.

obviously we can check condition 2. automatically. but condition 1.?

how does one check that given some SVG test document that contains a blue hexagon isn't optimized by Scour to a smaller SVG that renders a red triangle?

do we expect Scour to retain pixel exact identity between the rendering of the original SVG and the optimized SVG?

if so it might be possible to automated ..

oberstet avatar Apr 21 '18 09:04 oberstet

Tobias Oberstein:

and yeah, the 2nd thing: output size. because that's the whole point of Scour, right? ;) so actually, that could make sense:

the ultimate project CI benchmark for a PR is:

  1. output is functionally correct (it is valid SVG and parses and renders correct in browsers etc)

Imagemagick as a compare tool that allegedly can help us with 1. Something like:

$ compare -metric AE -fuzz 1% orig.png optimized.png diff.png

Obviously, it will not test all browsers/renders nor animations, but it should catch visible artefacts in simple images (over the "fuzz"-threshold - see imagemagick-cmdline)

  1. output size is smaller than input and previous version of Scour output

Ok. The scour tool has a bunch of options that control optimized size at the price of visual artefacts. Which options should we use for this test? Perhaps something like the following as the default?

  --enable-id-stripping
  --enable-comment-stripping
  --shorten-ids
  --indent=none
  --no-line-breaks
  --strip-xml-prolog
  --remove-descriptive-elements
  --set-precision=8 [--set-c-precision=8]
  --disable-embed-rasters
  --renderer-workaround

[...]> do we expect Scour to retain pixel exact identity between the rendering of the original SVG and the optimized SVG?

Depends on the options used (e.g. the --set-precision is exactly changing pixel identities). However, with compare we can define an "acceptable level of error" via the -fuzz parameter and then choose the options so conservative that we stay inside that level of error.

Thanks, ~Niels

nthykier avatar Apr 21 '18 11:04 nthykier

rgd pixel correctness: ah, right, that is a user knob.

ideally then, the CI would test at different precision vs size tradeoff levels

and the CI would also test using imagemagick, as well as at least one real world browser rendering engine

maybe render using the browser engine, and compare using imagemagick?

all of this is to bring our CI testing as close as possible to what the user of Scour is really interested in

fwiw, I think in the case of Scour, benchmarking the resource consumption for the tool (CPU, RAM) is a secondary priority, because it will only matter if you wanna scour 1000s of files .. and for this scenario, we should just - wait until it happens;) and then maybe just implement parallelization, and done. scour isn't nginx.

oberstet avatar Apr 21 '18 12:04 oberstet