scour
scour copied to clipboard
Scouring an already scoured SVG can sometimes produce a smaller SVG
I ran the following Scour 0.35 command with image.svg (2829 bytes) as the input file:
scour -i image.svg -o image-scoured.svg --enable-viewboxing --enable-id-stripping --enable-comment-stripping --shorten-ids --indent=none
Which produced image-scoured.svg (1287 bytes).
Interested to see the result of scouring an already scoured SVG, I ran the same command again, using the output from the previous command as the input.
scour -i image-scoured.svg -o image-scoured-scoured.svg --enable-viewboxing --enable-id-stripping --enable-comment-stripping --shorten-ids --indent=none
Surprisingly, scouring the already scoured file using the same settings produces an even smaller file; 1280 bytes. Further scouring does not reduce this file size.
One would think that scouring an SVG multiple times should not produce a smaller output than scouring it once.
Testing 84 different SVG files as input, 8 of them were affected by this bug. Below are the differences between the output files. If necessary I can upload the input files.
3 of them had this same difference:
<g fill="none" font-family="sans-serif" font-size="12" stroke-dasharray="" stroke-miterlimit="10">
<g font-family="sans-serif" font-size="12" stroke-dasharray="" stroke-miterlimit="10">
Diff: -fill="none"
(-12 bytes)
<g transform="matrix(1 0 0 1 -.00047728 0)">
<g transform="translate(-.00047728)">
Diff: matrix(1 0 0 1 -.00047728 0)
-> translate(-.00047728)
(-7 bytes)
<path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.8804c2.6842 0 4.8804-2.1962 4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z"/>
<path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.8804 4.8804-2.1962 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z"/>
Diff: 4.8804c2.6842 0 4.8804-2.1962 4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0
-> 4.8804 4.8804-2.1962 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0
(-10 bytes)
<path d="m28 7c-2.209 0-4 1.79-4 4v25c-1.657 0-3 1.344-3 3s1.343 3 3 3c1.305 0 2.403-0.838 2.816-2h1.184 20v-33h-20z" fill="#42A5F5"/>
<path d="m28 7c-2.209 0-4 1.79-4 4v25c-1.657 0-3 1.344-3 3s1.343 3 3 3c1.305 0 2.403-0.838 2.816-2h21.184v-33h-20z" fill="#42A5F5"/>
Diff: 2h1.184 20v-33h-20z
-> 2h21.184v-33h-20z
(-2 bytes)
<path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.88 04c2.6842 0 4.8804-2.1962 4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z" fill="#455a64"/>
<path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.8804 4.8804-2.1962 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z" fill="#455a64"/>
Diffs: 4.88 04c2.6842 0
-> 4.8804
4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0
-> 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0
(-10 bytes)
<path d="m20 7c-1.105 0-2 0.895-2 2v3h-13v29h13 12 13v-32c0-1.105-0.895-2-2-2h-8c-1.105 0-2 0.895-2 2v3h-1v-3c0-1.105-0.895-2-2-2h-8z" fill="#8bc7f8"/>
<path d="m20 7c-1.105 0-2 0.895-2 2v3h-13v29h38v-32c0-1.105-0.895-2-2-2h-8c-1.105 0-2 0.895-2 2v3h-1v-3c0-1.105-0.895-2-2-2h-8z" fill="#8bc7f8"/>
Diff: 2v3h-13v29h13 12 13v-32c0-1.105-0.895-2-2-2h-8c-1.105
-> 2v3h-13v29h38v-32c0-1.105-0.895-2-2-2h-8c-1.105
(-6 bytes)
This is obviously not optimal.
I can think of two options:
- Use a kind of "brute-force" approach and add an option to run Scour multiple times (until the file size does not change anymore).
- Figure out if we can improve the optimization algorithms to always result in the smallest size possible.
While 1. is not a very clean solution I'm afraid 2. might requite a considerable amount of effort and since we're talking about only a few bytes that can be saved the time might be spent more effectively.
Certainly this needs further investigation to check if there's a straightforward fix for one or more of the individual glitches.
Hi @HatScripts
The test case in your gist only triggers the transform problem, do you have some examples that trigger issues in the d
attribute of the path
tag?
For the matrix
-> transform
case, it appears to happen due to a precision reduction. AFAICT, the original SVG has:
<g
id="g3787"
transform="matrix(1.0000227,0,0,1,-4.7728357e-4,0)>"
This is reduced to:
<g id="g3787" transform="matrix(1 0 0 1 -.00047728 0)">
(Note the first parameter in matrix is reduced to a plain 1). Once the matrix has pure 0 and 1 integer parameters, then it can apply the rewrite from optimizeTransform
(which happens in the second run).
If the first precision rewrite is acceptable, then we should be able to just apply the same level of "fuzziness" in optimizeTransform
before testing for the known patterns.