scour Allow 20% less precision option to drop some final 1, 2, 8, 9

Allow 20% less precision option to drop some final 1, 2, 8, 9

Open pborunda opened this issue 7 years ago • 13 comments

Least significant digit that is equal to length precision can be safely rounded up or down if the digit is 1, 2, 8, 9. This is equivalent to 20% less precision so --set-precision 4 would behave like --set-precision 3.8

So --set-precision 3 3.593 is converted to 3.59, but its safe to round to 3.6 3.993 is converted to 3.99, but its safe to round to 4

This is not these same as --set-precision 2 3.593 and 3.553 both converted to 3.6 3.993 is converted to 3.953 both converted to 4

This actually helps shave off a few more bytes/KB for large svgs, when the difference between when going from --precision 4 to --precision 3 is visually not acceptable.

Right now i do this manually as an additional step in a bash script outside of scour, but I feel others might find it helpful to integrate to their projects.

I can submit a PR to demonstrate

Feb 24 '17 16:02 pborunda

Rounding numbers like this is just arbitrary... It obviously saves some bytes in certain circumstances (as everything that reduces precision somehow will), but it blindly assumes that 3.8..4.2 = 3 while 3.2..3.8 and 4.2..4.8 should not be touched. Why? What makes those ranges more important? Or better yet: What makes the range around 3 less important?

A more general approach to what you suggest would be a possibility to further quantize the number space beyond what the precision option allows (i.e. a fractional precision were numbers are only allowed to take discrete values, e.g. ... 3.6 3.8 4 4.2 4.4 ...). As you can see it saves a byte if the value is close to 4 (e.g. 3.9 or 4.1) but would only reduce precision for other numbers (e.g. 3.7) while not saving any bytes.

So now we can say saving that byte is so important for us that we arbitrarily reduce precision close to values where it can save bytes, but the same question as above applies: Why is precision close to "whole" numbers less important?

In the end what you're seeing is the downside of the decimal system: We can only reduce precision in factors of ten. With binary we could just drop another bit and reduce precision in more granular steps but we're stuck with decimal numbers and in my opinion it does not make sense to try to install some strange hacks to work around that fact only to shave of some bytes in very specific scenarios while it might destroy parts of an image in others (i.e. suppose three lines drawn from x1=3.9/4.0/4.1. y1=0 to x2=3.9/4.0/4.1, y2=1 with precision of 2 and your suggested behavior)

Feb 24 '17 18:02 Ede123

Here is the content of my script in case anyone finds it helpful, it also does batch on the current folder

`#Batch optimize all svgs in directory with scour, then apply crude regex optimization to improve somemore

find -name \.svg ! -name \.min.svg | while read SVGFILE; do SVGOFILE=$(echo ${SVGFILE}|sed "s/svg$/min.svg/"); scour -i $SVGFILE -o $SVGOFILE -q --set-precision 3 --set-c-precision 1 --strip-xml-prolog --strip-xml-space --remove-metadata --indent=none --enable-comment-stripping --enable-id-stripping --enable-viewboxing ; echo $SVGOFILE;done

#Limit decimal precision to 3 max sed -i -e's/(.[0-9][0-9][0-9])[0-9]*([^0-9])/\1\2/g' *.min.svg

#Round Down 2 decimal places sed -i -e's/([a-z -][0-9]*.[0-9][0-9])[0-4]([^0-9])/\1\2/g' *.min.svg

#Round Up to 2 decimal places sed -i -e's/([a-z -][0-9].[0-9])0[5-9]([^0-9])/\11\2/g' -e's/([a-z -][0-9].[0-9])1[5-9]([^0-9])/\12\2/g' -e's/([a-z -][0-9].[0-9])2[5-9]([^0-9])/\13\2/g' .min.svg sed -i -e's/([a-z -][0-9].[0-9])3[5-9]([^0-9])/\14\2/g' -e's/([a-z -][0-9].[0-9])4[5-9]([^0-9])/\15\2/g' -e's/([a-z -][0-9].[0-9])5[5-9]([^0-9])/\16\2/g' .min.svg sed -i -e's/([a-z -][0-9].[0-9])6[5-9]([^0-9])/\17\2/g' -e's/([a-z -][0-9].[0-9])7[5-9]([^0-9])/\18\2/g' -e's/([a-z -][0-9]*.[0-9])8[5-9]([^0-9])/\19\2/g' *.min.svg

#Round Up to 1 decimal place sed -i -e's/([a-z -][0-9].)0[89]([^0-9])/\11\2/g' -e's/([a-z -][0-9].)1[89]([^0-9])/\12\2/g' -e's/([a-z -][0-9].)2[89]([^0-9])/\13\2/g' .min.svg sed -i -e's/([a-z -][0-9].)3[89]([^0-9])/\14\2/g' -e's/([a-z -][0-9].)4[89]([^0-9])/\15\2/g' -e's/([a-z -][0-9].)5[89]([^0-9])/\16\2/g' .min.svg sed -i -e's/([a-z -][0-9].)6[89]([^0-9])/\17\2/g' -e's/([a-z -][0-9].)7[89]([^0-9])/\18\2/g' -e's/([a-z -][0-9]*.)8[89]([^0-9])/\19\2/g' *.min.svg

#Round Up Near .999 fractional part to next whole number sed -i -e's/([a-z -])0.9[5-9]([^0-9])/\11\2/g' -e's/([a-z -])1.9[5-9]([^0-9])/\12\2/g' -e's/([a-z -])2.9[5-9]([^0-9])/\13\2/g' *.min.svg sed -i -e's/([a-z -])3.9[5-9]([^0-9])/\14\2/g' -e's/([a-z -])4.9[5-9]([^0-9])/\15\2/g' -e's/([a-z -])5.9[5-9]([^0-9])/\16\2/g' *.min.svg sed -i -e's/([a-z -])6.9[5-9]([^0-9])/\17\2/g' -e's/([a-z -])7.9[5-9]([^0-9])/\18\2/g' -e's/([a-z -])8.9[5-9]([^0-9])/\19\2/g' *.min.svg sed -i -e's/([a-z -])9.9[5-9]([^0-9])/\110\2/g' *.min.svg

#round Down Near zero .000 fractional part to whole number sed -i -e's/([a-z -][0-9]).0[1234]([^0-9])/\1\2/g' -e's/([a-z -][0-9].[1-9])[12]([^0-9])/\1\2/g' *.min.svg

#round 3-7 to 5 for better gzip compression sed -i -e's/(.[0-9])[4-6]([^0-9])/\15\2/g' -e's/(.[0-9])[37]([^0-9])/\15\2/g' *.min.svg

#Remove leading zeros sed -i -e's/([a-z -])0(.[0-9])/\1\2/g' *.min.svg

#get rid of negative zero sed -i -e's/([a-z])[-]0([a-z -])/\10\2/g' *.min.svg sed -i -e's/[-]0([a-z -])/ 0\1/g' *.min.svg`

Feb 24 '17 18:02 pborunda

Good point, yes 1, 2, 8, 9 seem arbitrary, but i forgot to mention i round 3-7 to 5. I'm not to sure of the theory behind it, but the results look well on screen.

Feb 24 '17 18:02 pborunda

but i forgot to mention i round 3-7 to 5

This changes things a bit as it's basically a case were you only allow one discrete value between two decimal steps, i.e. ... 3 3.5 4 4.5 ... in the example outlined above. In this example it doubles precision compared to a precision of 1 while savings are ok, as it saves a byte in every second number compared to precision 2.

This very special case might actually be worth a thought as it's not arbitrarily reducing precision (you're basically rounding with a precision of 0.5 significant digits). I don't know if it's very helpful in real world though, and it would require to complicate our code, as this case can't be handled by decimal.

I'll think about it, but I doubt it's worth the effort.

Feb 24 '17 18:02 Ede123

Thinking of it it might actually be pretty easy: multiply the value by 2 -> reduce precision -> divide the value by 2. Done. Then we only need to handle the precision option to accept e.g. "2.5".

I'd still want to test this against some real-world samples to see if it actually is useful or just an unnecessary complication...

Feb 24 '17 19:02 Ede123

Ok, thanks for considering! I think it would be less code than the messy script i describe above. Regular expressions don't like arithmetic. :)

Feb 24 '17 19:02 pborunda

For reference, on ship.svg and squirrel.svg from #136. The difference between running with the above script and without is: squirrel.min.svg 321 bytes ship.min.svg 569 bytes

I know the script is not the same the described implementation, but just to give an idea.

Feb 24 '17 20:02 pborunda

My guess is that generally it would be more or less half the number of bytes between to precision settings, but not sure. So from the example below 5865 bytes (-p 2) output is roughly half between 5001 bytes (-p 1) and 6693 (-p 3)

$ scour --strip-xml-prolog --strip-xml-space --remove-metadata --indent=none --enable-comment-stripping --enable-id-stripping --enable-viewboxing --set-c-precision 1 --set-precision 1 squirrel.svg> tmp.svg Scour processed file "squirrel.svg" in 28 ms: 5001/14422 bytes new/orig -> 34.7%

$ scour --strip-xml-prolog --strip-xml-space --remove-metadata --indent=none --enable-comment-stripping --enable-id-stripping --enable-viewboxing --set-c-precision 1 --set-precision 2 squirrel.svg> tmp.svg Scour processed file "squirrel.svg" in 30 ms: 5865/14422 bytes new/orig -> 40.7%

$ scour --strip-xml-prolog --strip-xml-space --remove-metadata --indent=none --enable-comment-stripping --enable-id-stripping --enable-viewboxing --set-c-precision 1 --set-precision 3 squirrel.svg> tmp.svg Scour processed file "squirrel.svg" in 30 ms: 6693/14422 bytes new/orig -> 46.4%

Feb 24 '17 21:02 pborunda

You probably already thought of this, but could it be generalized further? --set precision 2.5 divide by (0.5) -> reduce precision (2) -> multiply by (0.5) --set precision 2.654 divide by (0.652) -> reduce precision (2) -> multiply by (0.652)

Maybe my thinking is wrong.

someone could make a cool slider UI tool wrapped around scour, with this type of continuous precision.

Feb 24 '17 22:02 pborunda

No, that won't work for the exact reasons described above.

Using numbers like you suggest would in fact significantly blow up file size and achieve nothing.

Let me give you an example (assume a base precision of 1 significant digits):

intial value: 1.9 -> 1.9*2 = 3.8 -> reduce_precision(3.8) = 4 -> 4/2 = 2 -> done
intial value: 1.6 -> 1.6*2 = 3.2 -> reduce_precision(3.2) = 3 -> 3/2 = 1.5 -> done
intial value: 1.9 -> 1.9*1.529 = 2.90501 -> reduce_precision(2.90501) = 3 -> 3/1.529 = 1.962... -> done, but what the hell?

It really only works for factors of 2 and 5 (i.e. factors that map to even digits) and in the case of 5 this would be the example described in the linked comment which would only decrease precision while not gaining much.

Feb 25 '17 00:02 Ede123

I would be glad if I could somehow specify a custom rounding grid. Like round to halves, round to quarters, round to fifths, etc.

I know this can be accomplished by scaling up the svg in question, then doing decimal rounding, then scaling back down, but this would be much-much easier

Aug 12 '20 20:08 mrmeszaros

@pborunda Rounding might be improved, but rounding down if number with finial 1,2 and rounding up if finial number is 8,9 does not make much sense to me. However it might should depended on the first digit: Sometimes you want that 09.876543210987654321 12.345678901234567890 get the same number of absolute digits, therefore there should be a difference between 1 and 9 ; and not between 9 and 10. So the number of digits should be considered imho on a log-scale.

If it is about minimizing a file, one digit too much does mostly not make a big (relative) change in the file-size.

Introducing a grid as @mrmeszaros suggests is a useful feature, but imho not in the scope of svg-cleaners.

Sep 06 '20 10:09 JoKalliauer

@JoKalliauer Although a generalization of the proposed half-grid-rounding, a general sub-division grid might be too much for this.

I think @pborunda meant a plain rounding algorithm with double fine grid (what I call "half-grid"): rounding numbers to end with .0 or .5 based on which one is closer. When considering only tenths, that means .1 and .2 gets rounded down, while .8 and .9 vets rounded up. (@pborunda please confirm)

Anyways, a decimal-place (digit count after the decimal point) based rounding would be really useful apart from the currently available precision based.

Sep 06 '20 11:09 mrmeszaros

scour scour copied to clipboard

Allow 20% less precision option to drop some final 1, 2, 8, 9

scour
scour copied to clipboard