dragonbox icon indicating copy to clipboard operation
dragonbox copied to clipboard

Decimals and other binary bit sizes

Open sirinath opened this issue 3 years ago • 17 comments

How does this change with decimals and other binary bit sizes? Can they be added?

sirinath avatar Nov 04 '20 16:11 sirinath

Could you elaborate in more details? Do you mean supports for decimal float formats & formats like ieee-754 binary128 or quadfloat?

jk-jeon avatar Nov 04 '20 18:11 jk-jeon

Can all these be supported:

Name Common
binary16 Half precision
binary32 Single precision
binary64 Double precision
binary128 Quadruple precision
binary256 Octuple precision
bfloat16 Brain Floating Point
minifloats
TensorFloat-32
Extended precision formats
decimal32
decimal64
decimal128

sirinath avatar Nov 05 '20 03:11 sirinath

First, Dragonbox has nothing to do with decimal floats, because Dragonbox is a binary-to-decimal conversion algorithm. The format is already decimal, so there is nothing to convert.

For other binary float formats, I have to mention that some components in this library are there for possible extensions to other formats. More specifically, extensions for other formats can be provided by specializing ieee754_format_info and ieee754_traits class templates.

However, currently, I'm using a enum constant to distinguish the formats. But the problem with enum is that it is not open-ended. Users can't add a new enum identifier without modifying the library code, and probably a better approach might be to replace enum's with tag types.

The reason I made it like this is because in order to the use core features of Dragonbox with an exotic format, the cache table for the format should be prepared, which requires a fair amount of knowledge on the algorithm. It's not that much if table generation is the only goal: each entry of the cache table is just the first Q significand bits of powers of 10, where the exponent is running from the constant min_k to the constant max_k, where these constants are automatically computed from other format parameters. But users should determine the precision of the cache table and verify that the chosen precision is enough, which is not a trivial task.

However, I'm not sure which one is better between the closed-ended enum approach and the open-ended tag type approach. Do you have any idea on this?

Another issue is that I don't know all of the formats you listed. I'm pretty sure that the algorithm can be generalized for binary16, binary128, and binary256 formats, but it can be possible that some of the other formats are out of range. (In fact, a naive approach of generating the full cache table of binary128 or binary256 will be not usable as the size of the table will be enormous, thus it will be necessary to come up with a table compression scheme similar to what I've done for binary64 format. I'm pretty sure it should be possible, but it's a nontrivial task.)

tl;dr Dragonbox could have supported other formats, but it's currently supporting only binary32 and binary64, and you can consider contributing to Dragonbox if you wish.

jk-jeon avatar Nov 05 '20 05:11 jk-jeon

However, I'm not sure which one is better between the closed-ended enum approach and the open-ended tag type approach. Do you have any idea on this?

Making open-ended would be better.

the cache table for the format should be prepared

Maybe this can be generated than hardcoded.

The format is already decimal, so there is nothing to convert.

Even decimals have the string conversion part.

Another issue is that I don't know all of the formats you listed.

See:

  • https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
  • https://en.wikipedia.org/wiki/Minifloat
  • https://medium.com/@moocaholic/fp64-fp32-fp16-bfloat16-tf32-and-other-members-of-the-zoo-a1ca7897d407

sirinath avatar Nov 05 '20 06:11 sirinath

Maybe this can be generated than hardcoded.

Generating them in compile-time is theoretically possible, only if the users can afford stupidly long compilation time, and compilers' constexpr evaluation limit is absurdly long, and we have a constexpr bignum library. (And as I pointed out, precision checking is even more involved.) If you are talking about having a separate runtime code for generating the table, it's already in this repo, although it's currently not supposed to be consumed by users.

Even decimals have the string conversion part.

As written in README, string conversion is not the focus of this library. And it would be not difficult to write it manually since no conversion is necessary. You can even reuse the string generation part of this library if you want; you can just extract the significand and the exponent from the input and wrap it with fp_t and pass it to the string generation routine.

See:

  • https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
  • https://en.wikipedia.org/wiki/Minifloat
  • https://medium.com/@moocaholic/fp64-fp32-fp16-bfloat16-tf32-and-other-members-of-the-zoo-a1ca7897d407

Okay, I'll have a look.

jk-jeon avatar Nov 05 '20 06:11 jk-jeon

A few thoughts after looking at it.

  1. There should be not so many difficulties in supporting binary16 and bfloat16.
  2. For minifloat, since it is only 1 byte, I think fully table-based float-to-string conversion makes more sense than compute-on-the-fly approach. Thus I think it is out of range of this library.
  3. Supports for binary128 and Intel's 80-bit floating-point formats might be possible without a tremendous amount of effort. (I think Ryu library already have them.)
  4. Support for binary256 is theoretically possible, but I doubt if it is worth doing. It will require ridiculous amount of precomputed cache table. And afaik it is not a very popular format neither.

Thanks for bring this issue up. I think I should first decide between tag-type approach and enum approach.

jk-jeon avatar Nov 23 '20 17:11 jk-jeon

@expnkx Okay, I'll send you a message in discord

jk-jeon avatar Nov 23 '20 18:11 jk-jeon

I have similar requirement only for 80 bit floating point format (ie typical long double type in g++ for example, MSVC long double is same size as double). I am happy to eventually work on adding this but no idea how difficult it is for someone not familiar with DragonBox code. I'd be grateful if you could advise on this. Thanks in advance.

zejal avatar Sep 04 '21 17:09 zejal

@zejal First of all, thanks a lot for your attention.

Ideally, the step 0 is to understand the algorithm. This is probably the hardest step if you are not familiar with this kinds of stuffs, or possibly not so daunting step if you already have experience in similar algorithms. The whole algorithm is explained in detail in this paper.

Assuming that you have enough energy to dive into it, let me briefly go over the paper.

  • Section 1 : Not a big deal.
  • Section 2: If you are not familiar with IEEE-754, here is an explanation. If you are already familiar with IEEE-754, you may still need to at least skim over this section, because it introduces notations used throughout the paper. For me, notation was one of the main reason why papers in these stuffs are confusing and not so touching sometimes.
  • Section 3: This section is about some necessary parts of Schubfach, which is another algorithm which Dragonbox is based on. You could read this paper instead in order to understand Schubfach which is the one written by the author of Schubfach, but I personally think my exposition is easier to read as the original paper introduces way too many notations (and his proof writing style is not of my taste to be honest). You do not need to read proofs (if you believe my never-peer-reviewed paper); you just need to understand what each proposition really means. Reading proofs might or might not help in that regard, though. There are also some figures that help the illustration.
  • Section 4: This section is about Dragonbox. You might need to carefully read Section 4.1 except for the proof of Proposition 4.1. Figure 3 pretty much explains why Proposition 4.1 must be true, and the proof is just a formal verification. Other subsections are about specific details of the skeleton introduced in Section 4.1. I know that this section is very technical and not a breeze to read, but unfortunately I don't have an easier way to explain how it works. I think the best way to understand the details is to actually write an implementation (but not necessarily a full-blown library) by yourself, by following Section 4 subsection-by-subsection. Well, to implement Section 4.2, you need a table. But for float and double it is fortunately already there in this repo.
  • Section 5: Continued explanation of Dragonbox. If you've read and understood Section 4, this section must be a breeze.
  • Section 6: This section is an explanation of why Section 4.2 works. You could just skip this section I guess, because probably Q=192 will just work for 80-bit floats. But if you care about absolute trustworthiness, you might need to go over this section as well. (Well, Q=256 definitely should work, I guess, if you suspect Q=192.) I actually have a little bit better method of finding the upper bound than what's explained in this section, but probably that's not a big deal. You can also refer to Section 4.3 of this paper as well if you are interested in how to actually compute the upper bound. I personally think that the algorithm explained there is a way more significant contribution of myself to this field than Dragonbox, although nobody seems to be interested in that algorithm :)
  • Section 7: Just skip.
  • Appendix A, B: Just skip, if you do not care about exotic rounding modes.

After understanding the algorithm, the next step is the float traits. All of those "float traits" stuffs in this repo are intended for possible extension to other floating-point formats. Specifically, you will need to define a new float traits struct by taking a look at the default implementation. Currently, the default one is somewhat broken since it is relying on an undefined behavior. (I will probably fix it by today.)

And then, you need to generate the cache table. This is probably a bit annoying step. I guess the "full cache table" might be too large to be practical, so you may need to go for the "compressed cache table" route. Relevant files are this and this.

After that, I guess you might need to implement several format-specific optimized subroutines. At this point I'm not 100% sure if we need no significant change in order to make it work. Please let me know if you find anything needing a big overhaul.

Again, thanks a lot for your attention and please let me know if there is anything I can help.

jk-jeon avatar Sep 05 '21 00:09 jk-jeon

Currently, the default one is somewhat broken since it is relying on an undefined behavior. (I will probably fix it by today.)

Just did it: https://github.com/jk-jeon/dragonbox/commit/14c02bd1bcdf0b69820b6431b4e38751511bb807

jk-jeon avatar Sep 05 '21 02:09 jk-jeon

Thanks a lot for all this. I've started having a look to the paper but guidance is very helpful.

zejal avatar Sep 05 '21 13:09 zejal

I'm far from understanding all algorithm details but I see that implementation quite strongly assumes either 32 or 64 IEEE 754 formats. For example in several locations: if constexpr (std::is_same_v<format, ieee754_binary32>) { .... } else { static_assert(std::is_same_v<format, ieee754_binary64>); ... }. Would it make sense to replace these with static functions in each struct ieee754_binaryXX ? So adding a new ieee754_binary would be easier ? Also I see there is a logic to remove trailing zeros, which it's deemed costly, would be any way to avoid producing these trailing zeros or would such strategy be too costly or impossible ? Just wondering of course. Thanks.

zejal avatar Sep 08 '21 13:09 zejal

I see that implementation quite strongly assumes either 32 or 64 IEEE 754 formats. For example in several locations: if constexpr (std::is_same_v<format, ieee754_binary32>) { .... } else { static_assert(std::is_same_v<format, ieee754_binary64>); ... }. Would it make sense to replace these with static functions in each struct ieee754_binaryXX ? So adding a new ieee754_binary would be easier ?

I somewhat already tried to minimize the occurrences of such if constexpr, but yeah I think we should make such a change at some point. The reason I couldn't further reduce such occurrences is that I was not sure about where to draw the customization boundary.

Also I see there is a logic to remove trailing zeros, which it's deemed costly, would be any way to avoid producing these trailing zeros or would such strategy be too costly or impossible ? Just wondering of course.

I believe this is more or less a necessary quirk of how the algorithm works. Right now I don't see how to avoid it, but any suggestions are welcome of course.

jk-jeon avatar Sep 09 '21 01:09 jk-jeon

hello @all,
looks like this thread / issue is somewhat 'on halt'?
would like a version for 80-bit long doubles, and lack the skills to build such from above info.
would also like a 'newbie-friendly' info / howto if it's possible to call dragonbox functions from vanilla-C.

Supports for binary128 and Intel's 80-bit floating-point formats might be possible without a tremendous amount of effort. (I think Ryu library already have them.)

yes, it's a little complex, didn't try 128 or other figures but 80-bit long doubles work with the three steps as in 'method B.' in the snippet below. 'method A.' is the path for doubles or floats.

// Unless required by applicable law or agreed to in writing, this software 
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
// KIND, either express or implied. 
 
// requires 'libryu.a' accessible, 
// build: 'gcc -L . convert_long_double_to_string_a.c -o convert_long_double_to_string_a -lryu -lquadmath -lm' 
// run:   './convert_long_double_to_string_a' 
 
#include <stdio.h> 		// reg. e.g. printf, 
#include <math.h> 		// reg. e.g. nextafterl, 
#include "ryu.h" 		// reg. e.g. d2s, 
#include "generic_128.h" 	// reg. e.g. generic_binary_to_decimal, 
#include "ryu_generic_128.h" 	// reg. e.g. generic_binary_to_decimal, 
#include <quadmath.h> 		// reg. e.g. float128, 
 
typedef struct floating_decimal_128 t_fd128; 
 
int main (const int argc, const char * const argv[]) 
{ 
	char buf_ch32[ 32 ]; 
	t_fd128 test_fd128; 
 
	long double xL = nextafterl( -1E21L, -1E+100 ); 
	printf( "conversion of . . . .  long double xL:  %.25Le\n",  xL ); 
	printf( " \n"); 
 
	printf( "// method A. - 'd2s' - performs conversion with double precision, \n",  xL ); 
	sprintf( buf_ch32, "%s", d2s( xL ) ); 					// one step conversion with optimized function, 
 
	printf( "buf_ch32:  . . . . . . . . . . . . . .  %s\n", buf_ch32 ); 
	printf( " \n"); 
 
	printf( "// method B. - 'generic' - performs conversion with long double precision, \n",  xL ); 
	test_fd128 = long_double_to_fd128( xL ); 				// splitting long double into decimal significant and base10 exponent, 
	int i = generic_to_chars( test_fd128, buf_ch32 ); 			// building a string from above figure, 
	buf_ch32[ i ] = '\0'; 			 				// terminating the string, 

	printf( "buf_ch32:  . . . . . . . . . . . . . .  %s\n", buf_ch32 ); 
	printf( " \n"); 
 
return 0; 
} 

newbie-02 avatar May 20 '22 06:05 newbie-02

would also like a 'newbie-friendly' info / howto if it's possible to call dragonbox functions from vanilla-C.

You just need to follow the usual procedure of wrapping C++ lib's with C interface. I.e., (1) write a header file with whatever list of declaration of functions you need, (2) write a cpp file that defines those functions by calling functions from dragonbox library, and then (3) compile that cpp file into a library file, and (4) use it from C by including the header file and linking to the library file.

(I accidently clicked "Close with comment" button while writing this😅)

You can find lots of information on this from the Internet. Most of the articles on it seems to assume object (classes) API therefore defines an opaque pointer, but since dragonbox API's are purely procedural, you don't need that.

jk-jeon avatar May 20 '22 13:05 jk-jeon

hello :-) @jk-jeon,

thank you for your fast reply,

you didn't say something about 'would like a version for 80-bit long doubles, and lack the skills to build such from above info.'. in general i can try to dive into, but have seen here and at ryu that people with better skills didn't finish, thus think it's too big a task for me.

let me explain the value of a long double version: in some way longs are dead, and also intel doesn't push them anymore, but! they provide a good way to check the results of double calculations which I consider very helpful for my work on last bit accuracy.

wrapping C++ - 'usual way' ... did you decode the implicit meaning of my nick? I'd need to learn more about 'C', plenty about C++, about their co-play, about compilers and options, about your project, rounding modes, ... besides the general value of learning, it would be better for the project and for a good solution that can be used by others if something like this is done by experienced people.

IMHO the two points above ( 'longs' and 'C' access ) could help a lot of people once they were available, whereas for the 'C' access float to string, double to string and long double to string would be sufficient.

TIA for any help from you or other persons ...

newbie-02 avatar May 21 '22 05:05 newbie-02

but have seen here and at ryu that people with better skills didn't finish, thus think it's too big a task for me.

I would say it is not a dauntingly hard task, but it is certainly something that requires nontrivial amount of time and energy. Also it is not something super-advanced or something that requires a lot of creativity and innovation, but it certainly requires enough understanding of the underlying algorithms, which I assume not so many people seem to have already acquired, nor willing to acquire, at this point. Fast but 100% accurate float-to-string conversion is probably quite a niche thing that not so many people are caring about, and especially that for 80-bit float is presumably even more niche. Maybe no one did it yet because it is not so super-rewarding given the amount of effort it requires.

Also, regarding the C-interface thing, I think this issue thread is not really the right place to talk about it. As I said, you can google, e.g., "wrapping C++ with C interface" and find plenty of information on it. Just be aware that the library API is not object-oriented so you need not to have a so-called "opaque pointer" from the interface.

jk-jeon avatar May 24 '22 00:05 jk-jeon