fast_float icon indicating copy to clipboard operation
fast_float copied to clipboard

feature request: to_chars() alternative?

Open biojppm opened this issue 4 years ago • 11 comments

Thanks for your work -- it ticks all the boxes! C++11, non terminated strings, and zero allocations - just what I was looking for in my library to address a really nasty issue caused by trying to stick to standard facilities while avoiding the performance-killing allocation cookie monsters from the STL.

But I am also looking for a matching to_chars() version/alternative that writes into a given buffer+size. I do not care about the roundtrip guarantee dictated by the standard.

Are you considering adding such a thing? Or are you aware of any implementation providing this function with similar quality and design choices?

I looked at ryu which does not tick all the boxes and has a large lookup table, but could work. I've also found fp which seems better but is C++17 and maybe a bit too fresh.

biojppm avatar Nov 07 '20 02:11 biojppm

I believe that the state-of-the-art is Schubfach algorithm's but I did not find a C++ implementation that I liked. In simdjson, we adopted Grisu2 which is not as good, but I could find good looking C++ implementations.

So I think that taking Schubfach, building a good implementation, testing it, tuning it, would be great. Note that there might be a good Schubfach implementation out there in C++, I just did not find one.

(For obvious reasons, when you are building software, you don't just want to use something that has the best algorithm. You have other constraints... like... can I trust the code not to blow up? Can I read through the code and understand it?)

I do not care about the roundtrip guarantee dictated by the standard.

Actually, this should come for free. The from_chars implemented in fast_float is exact (with round-to-even and all that) so if you have exact to_chars, then you get the round-trip for free. In fact, you get better.

lemire avatar Nov 08 '20 17:11 lemire

There's a Schubfach implementation here:

https://github.com/jk-jeon/fp/tree/master/subproject/3rdparty/schubfach

But am I correct in thinking Schubfach will only give us the equivalent to printf("%g")? I'd also like to have %e and %f together with a specified precision.

biojppm avatar Nov 10 '20 01:11 biojppm

Schubfach is the high-level algorithm and not a formatter per se, so you are correct that it does not do everything (nor is it meant to).

I have not looked at the pointer you give but it does look a good APL 2 library at a glance.

Let us look at the std::to_chars specification... So if we are just talking about std::to_chars, then you always want the shortest representation, though you need to support both f and e.

the value is converted to a string as if by std::printf in the default ("C") locale. The conversion specifier is f or e (resolving in favor of f in case of a tie), chosen according to the requirement for a shortest representation: the string representation consists of the smallest number of characters such that there is at least one digit before the radix point (if present) and parsing the representation using the corresponding std::from_chars function recovers value exactly. If there are several such representations, one with the smallest difference to value is chosen, resolving any remaining ties using rounding according to std::round_to_nearest

lemire avatar Nov 10 '20 02:11 lemire

On a related note, I've gathered the first benchmark results here.

Overall, fast_float is really fast on windows: 4x faster than std::from_chars(), and faster than everything else.

On Linux, it's among the faster, but there are some outliers and I have some suspicion over the results (eg, for clang10/Release/double, std::atof() is ~870MB/s, compared with fast_float at ~360MB/s). To be clear this is WSL so let's not jump to conclusions.

I do have some concerns over binary size. If you look at the data on the linux sizes, fast float is above 1.3MB, while a scanf is 12KB; even iostream has a smaller size, at ~1.2MB. To make things more comparable, I tried to request the static standard library, but I had no time to check if that was successful.

So, something to look at.

(And apologies if this is not the place to post such data.)

biojppm avatar Nov 12 '20 03:11 biojppm

If you look at the data on the linux sizes, fast float is above 1.3MB

It is a header-only library, but let us look at the size of the compiled binaries (which include the header, compiled in release mode with -O3):

$ ls -alh example_test
-rwxr-xr-x  1 lemire dialout  35K Nov 12 01:23 example_test

Now an empty "int main() {}" binary will use 17KB. So fast_float itself cannot be much more than about ~15KB in that case. It may be a bit more, I am not being very precise, but it is not 1.2MB.

For comparison, if you grab Gay's dtoa.c (which is effectively the inspiration/source for strtod), you will find that it compiles down to a 55 KB binary.

Note that simpler version of this algorithm is part of Go standard library (as of a few weeks ago) and they did consider binary size as a factor.

lemire avatar Nov 12 '20 03:11 lemire

Regarding benchmarking, I do have a pretty decent one there:

https://github.com/lemire/simple_fastfloat_benchmark

It used to support Visual Studio, but over several rounds of reengineering, I broke compatibility with Visual Studio. This could be fixed with some work?

lemire avatar Nov 12 '20 03:11 lemire

(And apologies if this is not the place to post such data.)

It is totally fair to assess binary size, but it would be better to do it in a separate issue.

lemire avatar Nov 12 '20 03:11 lemire

Now an empty "int main() {}" binary will use 17KB. So fast_float itself cannot be much more than about ~15KB in that case. It may be a bit more, I am not being very precise, but it is not 1.2MB.

Strange - that's exactly what I did. In my results the main is a loop using fgets() to read from stdin and then calling a macro which consists of the call to fast_float::from_chars() or is simply empty for the baseline. The Release size of the baseline with the empty loop comes to about 8.5KB in linux and 11KB in windows.

But it is really relevant here that I compiled this with the static standard library, so that may be causing the increased size. I will investigate this further and - if justified - pick this up in a different issue.

biojppm avatar Nov 15 '20 19:11 biojppm

@biojppm There is about 84 KB of code in there, most of it made of comments. The code volume is about the same as dtoa.c. I am not denying that you are seeing a potential issue, but one would still have to explain how ~85KB of code (mostly comments) turn into 1.2MB of binary.

lemire avatar Nov 15 '20 19:11 lemire

I am also interested in a to_char implementations which is super fast. Ideally faster than the Dragonbox algorithm.

sirinath avatar Dec 18 '20 08:12 sirinath

The dragonbox.cc implementation from abolz/Drachennest has been recommended to me by the author of Dragonbox. It doesn't require C++17; it compiles for me in C++11 mode.

It seems to work, but I've had to modify it for header-only use.

ecorm avatar Mar 29 '21 03:03 ecorm