metallic icon indicating copy to clipboard operation
metallic copied to clipboard

parsefloat_ / strtod can't pass special tests

Open MaxGraey opened this issue 6 years ago • 8 comments

I added extra tests in test/native/stdlib/strtod.c which unfortunately fail:

assert(parsefloat_(".2470328229206232720882843964341106861825299013071623822127928412503377536351044e-323", (void*)0) == 0x1p-1074);
// FAIL, actual: 0

assert(parsefloat_(".7410984687618698162648531893023320585475897039214871466383785237510132609053131e-323", (void*)0) == 0x1p-1074);
// FAIL, actual: 0

assert(parsefloat_(
"17976931348623158079372897140530341507993413271003782693617377898044 \
49682927647509466490179775872070963302864166928879109465555478519404 \
02630657488671505820681908902000708383676273854845817711531764475730 \
27006985557136695962284291481986083493647529271907416844436551070434 \
2711559699508093042880177904174497792", (void*)0) == INFINITY);
// FAIL, actual: 1.7976931348623157e+308

MaxGraey avatar Aug 19 '19 06:08 MaxGraey

Correctly rounded decimal to binary conversion was not planned, but I am taking this as a feature request. I will work on this after finishing its inverse in vfprintf because

  • Of this issue
  • I am already working on correctly rounded binary to decimal conversion for vfprintf
  • musl implements correctly rounded binary <-> decimal conversion

Current implementation is still conforming though, as binary -(native libc)-> decimal (DBL_DECIMAL_DIG digits) -(metallic)-> binary is identity conversion.

jdh8 avatar Aug 22 '19 18:08 jdh8

Hmm, but it seems it round right in most of cases. Just need fix bounded (lower and upper) cases.

MaxGraey avatar Aug 22 '19 20:08 MaxGraey

For IEEE-754 double,

  • DBL_DECIMAL_DIG = 17
  • DBL_DIG = 15

This means the following are identity conversions:

  • double -> 17 decimal digits -> double
  • 15 decimal digits -> double -> 15 decimal digits

Current implementation only ensures that the first 15 digits are done right because I wanted to reduce stack usage, but I changed my mind. :P

jdh8 avatar Aug 23 '19 12:08 jdh8

Is it required modify existing algorithm or completely reimplement?)

MaxGraey avatar Aug 23 '19 13:08 MaxGraey

It is almost a rewrite to make it correctly rounded. To avoid double rounding error and to reduce code size, a common strategy is: decimal string -(round to odd)-> intermediate binary -(round to nearest)-> target binary

The intermediate binary format has more significant digits (e.g. 128) than long double (112) in order to reuse the first conversion for all targets (float, double, long double).

Rounding to odd is to round all non-representable numbers to the nearest odd representation, so even representations must be exact. Why this algorithm works is explained in the following study:

Sylvie Boldo, Guillaume Melquiond. “When double rounding is odd”. 17th IMACS World Congress, Jul 2005, Paris, France. pp.11. ⟨inria-00070603v2⟩ (backup)

jdh8 avatar Aug 23 '19 19:08 jdh8

Sounds reasonable. Currently instead "decimal string -(round to odd)" you just use "set any non-zero bit for out of consuming range" here. But it seems also need 128-bit fixed point arithmetic. I just decided to reuse some parts of your library for parsing floats on our side and really interesting have similar simple (and compact) approach=)

MaxGraey avatar Aug 23 '19 20:08 MaxGraey

Setting lowest bit when seeing a nonzero digit out of range is effectively rounding to odd, so it produces correctly rounded results if no scaling by a power of 10 is involved.

jdh8 avatar Aug 24 '19 09:08 jdh8

The plan is to extend parsedec_ like

struct BigFloat
{
    unsigned __int128 significand;
    int exp;
};

struct BigFloat __parsedec(
    const char s[restrict static 1],
    char* end[restrict static 1]);

and convert BigFloat to target type in parsedec_. Rounding unsigned __int128 to nearest floating-points is already done by __floatunti[sdt]f.

I'd give __parsedec external linkage to reuse across target types, so it has a reserved name.

jdh8 avatar Aug 25 '19 11:08 jdh8