number icon indicating copy to clipboard operation
number copied to clipboard

Potential precision loss formatting decimals, integers, and strings

Open ScrimpyCat opened this issue 7 years ago • 6 comments

These issues all stem from the different formatters converting the value to a float before formatting it. I'm not sure if this is something you want the library to handle however. If you did want to alleviate the problem converting everything to Decimal and changing the functions to use that instead would be the easiest solution (though would mean Decimal would need to be a required dependency). If you don't want Decimal as a required dependency, then the other way to solve it would probably be to introduce a new conversion type that respects precision (a string or some format based on integers) and then operating on that.

ScrimpyCat avatar Sep 01 '16 15:09 ScrimpyCat

Can you be a little more specific about when this precision loss occurs with some examples?

danielberkompas avatar Sep 01 '16 18:09 danielberkompas

Because floats are represented as binary64's internally, it will occur anytime there's a value that is not exactly representable (e.g. large whole numbers, certain decimal places, etc.), or can happen when performing any floating point operations on the type.

So any functions that are converting decimals, integers, or strings to floating point values is susceptible, or functions that are involving floating point operations/arithmetic. The functions that can operate on the type alone without needing to convert them are ok though (like Number.Delimit.number_to_delimited/2 with integers, that is fine).

But some of the affected functions are: Number.SI.number_to_si/2 (due to the floating point operations), or any function that uses Number.Conversion.to_float/1 for treating any of the integers, strings, or decimals.

iex(1)> Number.SI.number_to_si 12345678955555555555555555555555555555555555555555555555
"12345678955555555615070773837824.00Y" #loss of precision (due to floating point operations when calculating the scaled_number)
iex(2)> Number.SI.number_to_si "12345678955555555555555555555555555555555555555555555555"
"12345678955555555615070773837824.00Y" #loss of precision (due to Number.Conversion.to_float/1)
iex(3)> Number.SI.number_to_si Decimal.new(12345678955555555555555555555555555555555555555555555555)
"12345678955555555615070773837824.00Y" #loss of precision (due to Number.Conversion.to_float/1)
iex(4)> Number.Currency.number_to_currency 12345678955555555555555555555555555555555555555555555555
"$12,345,678,955,555,555,555,555,555,555,555,555,555,555,555,555,555,555,555" #works correctly (as Number.Delimit.number_to_delimited/2 can handle integers)
iex(5)> Number.Currency.number_to_currency "12345678955555555555555555555555555555555555555555555555"
"$12,345,678,955,555,556,364,396,092,142,901,676,609,597,172,903,919,484,928.00" #loss of precision (due to Number.Conversion.to_float/1)
iex(6)> Number.Currency.number_to_currency Decimal.new(12345678955555555555555555555555555555555555555555555555)
"$12,345,678,955,555,556,364,396,092,142,901,676,609,597,172,903,919,484,928.00" #loss of precision (due to Number.Conversion.to_float/1)
iex(7)> Number.Delimit.number_to_delimited 12345678955555555555555555555555555555555555555555555555
"12,345,678,955,555,555,555,555,555,555,555,555,555,555,555,555,555,555,555" #works correctly
iex(8)> Number.Delimit.number_to_delimited "12345678955555555555555555555555555555555555555555555555"
"12,345,678,955,555,556,364,396,092,142,901,676,609,597,172,903,919,484,928.00" #loss of precision (due to Number.Conversion.to_float/1)
iex(9)> Number.Delimit.number_to_delimited Decimal.new(12345678955555555555555555555555555555555555555555555555)
"12,345,678,955,555,556,364,396,092,142,901,676,609,597,172,903,919,484,928.00" #loss of precision (due to Number.Conversion.to_float/1)

Float's themselves can also suffer the same problem when used with any functions that do any operations on them. Such as the Number.SI.number_to_si/2 function.

iex(1)> Number.SI.number_to_si 4503599627370495.50, precision: 16
"4.5035996273704954P" #loss of precision (due to the arithmetic like when calculating scaled_number)
iex(2)> Number.Delimit.number_to_delimited 4503599627370495.50
"4,503,599,627,370,495.50" #works correctly
iex(3)> Number.Currency.number_to_currency 4503599627370495.50
"$4,503,599,627,370,495.50" #works correctly

ScrimpyCat avatar Sep 01 '16 19:09 ScrimpyCat

Great information, thanks. So, you would recommend that Number convert everything to Decimal rather than floats?

danielberkompas avatar Sep 01 '16 20:09 danielberkompas

Yep, converting to Decimal and replacing the floating point operations with operations that work on Decimals instead would be easiest way to fix this issue.

ScrimpyCat avatar Sep 01 '16 21:09 ScrimpyCat

I have opened a PR (#19) which will improve the precision of number_to_delimited and number_to_currency by using Decimal.

Unfortunately, it doesn't look like I can fix number_to_si, since it relies on the Erlang math:log and math:pow functions which only take floats and have no equivalent functions in Decimal.

danielberkompas avatar Sep 08 '16 23:09 danielberkompas

Still awesome you've got the other variants fixed though. For my actual usage it was only currency I needed to work properly, so this is perfect. 😃

But yeh unfortunately Decimal doesn't provide those operations 😢. And I'm not aware of Decimal planning on implementing those types of operations anytime soon either. However if you wanted it is possible to implement that behaviour yourself (although this gets a bit hairy), while an arbitrary natural logarithm is a little more difficult to do. If SI is only expected to handle the base 1000 and base 1024 options (essentialy base 10 and binary), it could be simplified to (assuming my math checks out) only needing log10 and log2. And to make it even simpler all we really care about is the whole number part, rather than partials.

So an implementation of log10 would only need to workout how many digits are in the coefficient. This can be done by reducing the decimal value down until only one digit is in the coefficient, and then the exponent will give us the whole number log10. A gotcha though is the decimal places (for dealing with the mili, micro, nano, etc. prefixes), to adjust for those the value would need to be shifted to account for those decimal places.

An implementation of log2 is a bit more complicated. Although if the implementation of log10 was a true log10 (handled partials) then you could just do: Decimal.div(log10(n), log10(2)). But implementing them to handle the partials is trickier. Anyway, to implement a log2 that doesn't handle partials nor relying on log10, can be done by handling two separate cases.

The first case is whole numbers (0 or +/- 1 and greater), you can workout the log2 by simply calculating the index of the most significant bit of the whole number coefficient. So adjusting the decimal coefficient until its exponent is 0, will leave you with the whole number coefficient. Then just check the index of the most significant bit of that coefficient. In my example below this is done with count(mask(coef)) - 1 however it could be simplified to simply counting the shifts until the value is 0 (that fairly esoteric code I just ripped out of a project of mine cause I was being lazy haha).

The second case is fractional numbers (any number between 0 and 1), this is a bit more complicated to workout. I think the simplest way of calculating it is by working out the binary exponent for the given fraction (as a regular floating point would), though there may be a simpler way? In the example it is basically the fraction_to_exponent part, this was another thing I just ripped from another project of mine so it's a bit messy.

After that the rest is pretty much the same, the only thing left that's missing is a power function. That's simple enough to implement in decimal given we have multiplies and divides.

use Bitwise

def number_to_si(number = %Decimal{}, options) do
  options = Dict.merge(config, options)
  exp = compute_exponent(number, options[:base])
  prefix = exponent_to_prefix(exp)
  scaled_number = Decimal.div(number, pow(options[:base], exp))
  display_number = Decimal.to_string(Decimal.round(scaled_number, options[:precision]), :normal)
  final_number = if options[:trim], do: trim(display_number), else: display_number
  final_number <> options[:separator] <> prefix <> options[:unit]
end
def number_to_si(number, options) do
  if Number.Conversion.impl_for(number) do
    number
  #   |> Number.Conversion.to_float
    |> Number.Conversion.to_decimal
    |> number_to_si(options)
  else
    raise ArgumentError, """
    number must be a float or integer, or implement `Number.Conversion` protocol,
    was #{inspect number}"
    """
  end
end

#decimal power
defp pow(n, 0), do: Decimal.new(1)
defp pow(n, exp) when exp > 0, do: power(Decimal.new(n), exp)
defp pow(n, exp), do: Decimal.div(Decimal.new(1), power(Decimal.new(n), -exp))

defp power(n, 1), do: n
defp power(n, exp), do: Decimal.mult(n, power(n, exp - 1))

#coefficient adjust
defp adjust(n = %{ coef: coef }) when coef < 10, do: n
defp adjust(%{ sign: sign, coef: coef, exp: exp }), do: adjust(Decimal.new(sign, div(coef, 10), exp + 1))

defp log10(x), do: adjust(x).exp

defp adjust_neg(n) when n < 0, do: n - 2
defp adjust_neg(n), do: n

#count the set bits 
defp count(x) when x >= 0, do: count(x, 0)

defp count(0, total), do: total
defp count(x, total) do
  c = x &&& 0xffffffff
  c = c - ((c >>> 1) &&& 0x55555555)
  c = (c &&& 0x33333333) + ((c >>> 2) &&& 0x33333333)
  c = (c + (c >>> 4)) &&& 0x0f0f0f0f
  c = ((c * 0x01010101) &&& 0xffffffff) >>> 24

  count(x >>> 32, total + c)
end

defmacro is_power_of_2(x), do: quote do: (unquote(x) &&& ~~~-unquote(x)) == 0

#mask for bits
defp mask(x), do: mask(x, 1)

defp mask(x, _) when is_power_of_2(x + 1), do: x
defp mask(x, size), do: mask(x ||| (x >>> size), size <<< 1)

defp log2(%{ coef: 0 }), do: 0
defp log2(n) do
  n = Decimal.abs(n)
  if Decimal.cmp(n, Decimal.new(1)) == :lt do
    log2_dec(n) #fractional number log2
  else
    log2_int(n) #whole number log2
  end
end

defp log2_dec(%{ coef: coef, exp: exp }) do
  p = pow10(abs(exp))
  { e, v } = fraction_to_exponent(coef, p)
  if v > p, do: e + 1, else: e
end

defp log2_int(%{ coef: coef, exp: 0 }), do: count(mask(coef)) - 1 #index of MSB
defp log2_int(x = %{ coef: coef, exp: exp }) when exp < 0, do: log2_int(%{ x | coef: div(coef, 10), exp: exp + 1 })
defp log2_int(x = %{ coef: coef, exp: exp }), do: log2_int(%{ x | coef: coef * 10, exp: exp - 1 })

#integer 10 power
defp pow10(n), do: pow10(1, n)

defp pow10(x, 0), do: x
defp pow10(x, n), do: pow10(x * 10, n - 1)

defp fraction_to_exponent(v, precision, e \\ nil, index \\ 0)
defp fraction_to_exponent(v, _, e, _) when e != nil, do: { ~~~e, v }
defp fraction_to_exponent(0, _, e, _), do: { nil, 0 }
defp fraction_to_exponent(v, precision, e, index) do
  v = rem(v, precision) * 2
  fraction_to_exponent(v, precision, if(v >= precision, do: index), index + 1)
end

defp compute_exponent(number, _) when number == 0, do: 0
defp compute_exponent(number, 1000), do: (div(adjust_neg(log10(number)), 3) |> max(-8) |> min(8))
defp compute_exponent(number, 1024), do: (div(log2(number), 10) |> max(-8) |> min(8))

Examples (there's some small formatting differences between the Float.to_string and Decimal.to_string but that wouldn't be difficult to fixup):

iex(1)> Number.SI.number_to_si 0
"0"
iex(2)> Number.SI.number_to_si 1 
"1"
iex(3)> Number.SI.number_to_si 10
"10"
iex(4)> Number.SI.number_to_si 100
"100"
iex(5)> Number.SI.number_to_si 1000
"1k"
iex(6)> Number.SI.number_to_si 10000
"10k"
iex(7)> Number.SI.number_to_si 100000
"100k"
iex(8)> Number.SI.number_to_si 1000000
"1M"
iex(9)> Number.SI.number_to_si 4503599627370495.50, precision: 16
"4.5035996273704955P"
iex(10)> Number.SI.number_to_si 4503599627370495.50, precision: 1 
"4.5P"
iex(11)> Number.SI.number_to_si "0.0009765625", base: 1024  #1024-1   
"1m"
iex(12)> Number.SI.number_to_si "9.3132257e-10", base: 1024 #1024^-3
"1.00n"
iex(13)> Number.SI.number_to_si 1024, base: 1024
"1k"
iex(14)> Number.SI.number_to_si 1024*1024, base: 1024
"1M"
iex(15)> Number.SI.number_to_si 12345678955555555555555555555555555555555555555555555555, precision: 56
"12345678955555555555555555560000Y" #wrong but fixable, since the precision of Decimal's context isn't large enough
iex(16)> c = Decimal.get_context
%Decimal.Context{flags: [:inexact, :rounded], precision: 28, rounding: :half_up,
 traps: [:invalid_operation, :division_by_zero]}
iex(17)> Decimal.set_context %{ c | precision: 1000, rounding: :down }
:ok
iex(18)> Number.SI.number_to_si 12345678955555555555555555555555555555555555555555555555, precision: 56
"12345678955555555555555555555555.555555555555555555555555Y"
iex(19)> Number.SI.number_to_si 0.1
"100m"
iex(20)> Number.SI.number_to_si 0.01
"10m"
iex(21)> Number.SI.number_to_si 0.001
"1m"
iex(22)> Number.SI.number_to_si 0.0001
"100µ"

However with that all said, I think you're probably better off just waiting until the Decimal library does provide the necessary operations (as they'll be tested/correct, as my example is very much a hacked together solution) to fix the SI formatter.

ScrimpyCat avatar Sep 10 '16 23:09 ScrimpyCat