rfcs Suggestions for additional floating-point types

Suggestions for additional floating-point types

Open aaronfranke opened this issue 5 years ago • 48 comments

I noticed that, like other languages, the only floating-point types built-in are f32 and f64. However, I often have limitations with just these. I propose the following: ~~fsize, freal~~, and f128

~~fsize would be like isize but for floats. Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64.~~

~~Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:~~

~~#ifdef REAL_T_IS_DOUBLE~~ ~~typedef double real_t;~~ ~~#else~~ ~~typedef float real_t;~~ ~~#endif~~

~~I propose something similar in Rust, where you can just write freal or something and be able to change the precision later with compiler flags. The default would probably be f32.~~

Finally, it would be nice to have 128-bit floats (f128) in the language. These are not normally needed, but there are use cases for them, and it'd be nice to have it as a language built-in type. Some newer processors have 512-bit SIMD chipsets that can process these efficiently, though most don't.

If you only implement some of these proposals, that's fine too. Originally posted at https://github.com/rust-lang/rust/issues/57928

Jan 26 '19 22:01 aaronfranke

fsize would be like isize but for floats. Basically, use the version that's most efficient for your processor.

isize is not the integer type that's most efficient for your processors - it's the integer type that's the same size as a pointer. It's like ptrdiff_t, not int.

I propose something similar in Rust, where you can just write freal or something and be able to change the precision later with compiler flags. The default would probably be f32.

#[cfg(feature = "real_t_is_double")]
type real_t = f64;
#[cfg(not(feature = "real_t_is_double")]
type real_t = f32;

Jan 26 '19 23:01 sfackler

A better suggestion would be f16 support, as it is common in graphics.

Jan 31 '19 16:01 moonheart08

@moonheart08

Are f16 used much in intermediate calculations? I know it is used commonly as a storage format, but that last time I checked this (I wrote a Pre-RFC on this on internals a while back, but I'm a bit fuzzy on the details), a lot of the calculations involving f16 on most platforms is done by casting to f32, performing the op, the cast back to f16. If that is the case then having native f16 support may not be that important.

Adding the ability to use the F16C instructions may be useful to have in core::arch though, perhaps something like __m128h which has 8 "f16"s.

Mar 02 '19 10:03 shingtaklam1324

How about long double and 128-bit floats? ~I could be wrong, but I'm 99% sure that we currently unavoidably lose precision when using long doubles from C. On my computer (macOS), bindgen outputs f64, but sizeof(long double) in C outputs 16 bytes. (128 bits; for alignment I guess?).~

~(On a side note, is that even safe behavior? What about C functions that take long double *?)~

Mar 03 '19 01:03 Coder-256

@Coder-256 In C++, long double is 64-bit on Windows, 80-bit in MinGW, and 128-bit on Mac and Linux (probably indeed for alignment, as I don't think anyone implements it as quadruple precision).

Mar 20 '19 00:03 aaronfranke

@aaronfranke Could you please clarify what you mean? What I was trying to say is that Rust currently does not have any support for floats larger than 64 bits (8 bytes), for example, long double on certain platforms. I was also trying to point out that in addition to having limited precision within Rust code, this makes it difficult to interact with native code that uses large floats, such as using FFI with C code that uses floats larger than 64 bits.

There was also a separate issue with bindgen that caused float sizes to be incorrect for large floats, but that has been fixed (in rust-lang/rust-bindgen@ed6e1bbec439e8b260e6e701379fc70d295f35fe).

Mar 20 '19 00:03 Coder-256

I wasn't disagreeing with you, I was just adding information. Sorry if I wasn't clear. f128 would be great.

Mar 20 '19 00:03 aaronfranke

@aaronfranke I absolutely agree, both f128 and f80 would be very useful, especially for FFI (for example, Swift already has Float80 mainly for communicating with old C code, just an example to show how it could help)

Mar 20 '19 00:03 Coder-256

old things never be gone, I wanna push this. rust is a system language not a script language, need compat old things.

Oct 13 '20 10:10 lygstate

I wanna push add support for fp80 and fp128... any help need?

Oct 13 '20 10:10 lygstate

Like https://github.com/rust-lang/rust/pull/38482 does

Oct 15 '20 02:10 lygstate

Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64.

Even on modern x86 which has similar or equal speed between most f32 and f64 ops, f32 is still very much the fastest for your processor because it cuts cache misses in half.

Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:

#[cfg(real_is_f64)]
type real = f64;
#[cfg(not(real_is_f64))]
type real = f32;

then you can control via RUSTFLAGS="--cfg real_is_f64" (you can also use cargo features, but they're not a great fit for cases where enabling a feature can cause compile errors like this)

... Regarding suggestions of f80

What would f80 do on platforms that aren't x86? Noting else has native 80bit floats. It's not even part of IEEE754 (even though it's largely natural extension of it... although it has a lot of quirks). This is something that would be viable in core::arch::{x86,x86_64} but isn't portable. We don't want to have to implement these as software floats on other platforms.

I'd be in favor of a std::os::raw::c_long_double type but it would have to be carefully designed. Note that PPC's long double is exceptionally cursed, as it's a pair of doubles that are summed together...

I'd be in favor of f16, and tentatively f128 since binary128 is part of IEEE754 2019, at least.

EDIT: I hadn't noticed that sfalker said the exact same thing as my first point >_>

Oct 15 '20 07:10 thomcc

Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64.

Even on modern x86 which has similar or equal speed between most f32 and f64 ops, f32 is still very much the fastest for your processor because it cuts cache misses in half.

Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:
#[cfg(real_is_f64)]
type real = f64;
#[cfg(not(real_is_f64))]
type real = f32;
then you can control via RUSTFLAGS="--cfg real_is_f64" (you can also use cargo features, but they're not a great fit for cases where enabling a feature can cause compile errors like this)

... Regarding suggestions of f80

What would f80 do on platforms that aren't x86? Noting else has native 80bit floats. It's not even part of IEEE754 (even though it's largely natural extension of it... although it has a lot of quirks). This is something that would be viable in core::arch::{x86,x86_64} but isn't portable. We don't want to have to implement these as software floats on other platforms.

I'd be in favor of a std::os::raw::c_long_double type but it would have to be carefully designed. Note that PPC's long double is exceptionally cursed, as it's a pair of doubles that are summed together...

I'd be in favor of f16, and tentatively f128 since binary128 is part of IEEE754 2019, at least.

We have a fact that f80 are broadly used, and in forseable future, that's will continue. We have no need a soft f80 impl, just make on x86 platfrom f80 works is enough. Anyway a soft f80 may be a better option for cross platform consideration.

Oct 15 '20 07:10 lygstate

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

Oct 15 '20 07:10 programmerjake

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

For platform have f128, implmenet f80 would not cause significant performance down

Oct 15 '20 08:10 lygstate

@thomcc These are all ideas, not everything in the OP is relevant anymore since it has been discussed. I think fsize and freal have been discussed and dismissed, fsize is a bad idea considering the information in this thread and freal is indeed easy to implement with a small amount of lines of code so it doesn't need to be in the language.

That said, f128 is still desired for sure and has some use cases and some hardware support, f80 would be neat though I wouldn't use it personally, f16 would be useful especially in the context of low-end graphics though I also wouldn't use this myself, and if your goal is to cover IEEE 754 there is also f256 or octuple precision, though it's rare to see.

Oct 15 '20 08:10 aaronfranke

@thomcc These are all ideas, not everything in the OP is relevant anymore since it has been discussed. I think fsize and freal have been discussed and dismissed, fsize is a bad idea considering the information in this thread and freal is indeed easy to implement with a small amount of lines of code so it doesn't need to be in the language.

That said, f128 is still desired for sure and has some use cases and some hardware support, f80 would be neat though I wouldn't use it personally, f16 would be useful especially in the context of low-end graphics though I also wouldn't use this myself, and if your goal is to cover IEEE 754 there is also f256 or octuple precision, though it's rare to see.

may be we can add f16 f80 and f128 in a single shot?

Oct 15 '20 08:10 lygstate

f16 has uses in neural networks as well.

There are actually many problems with using f80, especially if we do not ship a soft float to cover it... it would not be a type defined by an abstraction, frankly, it would be a type defined by Intel's hardware quirks, and we would only be adding more on top of it. One of the nice things about Rust is that it is highly portable right now, so I do not think it makes sense to add such a non-portable type to the language and limit portability that much, though a language extension that makes it simpler to define and use such a non-portable type would make sense.

Oct 25 '20 19:10 workingjubilee

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

I can't say for sure about the other arches, but PowerPC's is not IEEE-754-like at all — it's double-double. It would not help for implementing a sane f128 nor would it help implement a f80.

For platform have f128, implmenet f80 would not cause significant performance down

I don't think this is really true (we can quibble over significant, I guess), but regardless rust doesn't exclusively target architectures in the sets {have native f80}, {have native f128}, so something that solves this for other architectures needs to be considered.

if your goal is to cover IEEE 754 there is also f256 or octuple precision, though it's rare to see.

I mean, it's not mentioned in IEEE-754 2019. It's not hard to imagine what it looks like, admittedly.

Anyway, I think once inline asm is stable someone who really wants f80 could implement it as a library on x86/x86_64. This wouldn't solve the issue of FFI (e.g. a c_long_double type), which I still think would be nice to solve, but I think has a lot of different design considerations, could just be a mostly-opaque type that includes little more than implementations of From<f64>/Into<f64> (e.g. no arithmetic).

Oct 25 '20 21:10 thomcc

@thomcc

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

I can't say for sure about the other arches, but PowerPC's is not IEEE-754-like at all — it's double-double. It would not help for implementing a sane f128 nor would it help implement a f80.

You're thinking of C's long double type; PowerPC does support IEEE-754 standard binary128 FP using new instructions added in Power ISA v3.0. Quoting GCC 6's change log:

PowerPC64 now supports IEEE 128-bit floating-point using the __float128 data type. In GCC 6, this is not enabled by default, but you can enable it with -mfloat128. The IEEE 128-bit floating-point support requires the use of the VSX instruction set. IEEE 128-bit floating-point values are passed and returned as a single vector value. The software emulator for IEEE 128-bit floating-point support is only built on PowerPC GNU/Linux systems where the default CPU is at least power7. On future ISA 3.0 systems (POWER 9 and later), you will be able to use the -mfloat128-hardware option to use the ISA 3.0 instructions that support IEEE 128-bit floating-point. An additional type (__ibm128) has been added to refer to the IBM extended double type that normally implements long double. This will allow for a future transition to implementing long double with IEEE 128-bit floating-point.

Oct 26 '20 00:10 programmerjake

Thanks, you're correct that I was thinking of the PPC long double (__ibm128) type. Unfortunately, I think the existence of 2 separate 128-bit "floating point" types on powerpc only complicates things, although it's nice that at least one of them is moderately sane.

Oct 26 '20 01:10 thomcc

Full(er) support for IEEE 754 would indeed be very welcome, especially for numerical work.

What would f80 do on platforms that aren't x86? Noting else has native 80 bit floats. It's not even part of IEEE 754 (even though it's largely natural extension of it... although it has a lot of quirks).

This is somewhat false, x86's 80-bit floats are extended precision binary64's as specified by IEEE 754.

However it's true that these are not very strictly defined, an extended precision binary64 has to have a larger precision than binary64 and the exponent range of binary128. This means that both x86's 80-bit floats and binary128 are examples of valid extended precision binary64's.

I'd suggest providing the following types:
f16 (binary16), f32 (binary32), f64 (binary64), f64e (binary64 extended) and f128 (binary128).

On x86 platforms, and others that have a native extended precision binary64, a f64e would be an 80-bit float or similar, on all others it would be the same as a f128.

[Edit: further clarified in the relation between 80-bits floats and IEEE 754.]

Nov 06 '20 19:11 eprovst

So, on the other side of "portable" is "layout". We have a lot of ambiguous-layout types which are not primitive types. However, as far as I am aware all the primitive types have a pretty explicit layout, and many of the std composite data types like Vec<T> etc. have most of their layout dialed in as well. Here we'd have two possible layouts on a numeric type which should be as simple as possible, andf64e is probably the wrong abstraction here because there's a lot of cases where someone wants "type N that fulfills X or else type M that fulfills a superset of X", especially for math libs.

Nov 07 '20 02:11 workingjubilee

I'm not too sure what you mean by 'layout' in this case, it's true that extended precision floats do not have to conform to a certain bit format. If you refer to the memory layout of complex data types, I'm not sure if there are any guarantees here anyway as I wouldn't be surprised optimisation passes can and do change these kinds of layouts.

I didn't give much thought to the syntax of f64e, something like ExtendedPrecision<f64> might indeed be the better choice here, which also neatly extends to the other fxx's.

Most do seem to agree on including all the common IEEE 754 types, which is, I think, the main goal of this issue. Something similar to Fortran's selected_real/integer_kind could also be looked at, but should probably be moved to another issue.

I'd have to check Rust's current support for other parts of IEEE 754 first. There are very few languages with good support for the hardware's capabilities in this area and those that do tend to be rather unsafe. Numerical analysis and other scientific computing do seem to be a great fit for Rust, so I think it's worth looking into this.

[Edit: typos and clarification]

Nov 07 '20 11:11 eprovst

I would expect f64e to be directly equivalent in bit representation, ABI, and layout to C/C++'s long double except in cases like MSVC on x86_64 where they pick long double == double even though f80 is still usable from a hardware level. There would be another type alias c_long_double for exact equivalence to long double on all platforms with an ABI-compatible C compiler and when the long double type is supported by Rust (so, probably excluding PowerPC's annoying double-double type for the MVP).

One interesting side-note: PowerPC v3.0 includes an instruction for converting float types to f80, though I think that's the only supported operation.

f128 would be directly equivalent to gcc/clang's __float128 type where supported.

Nov 08 '20 19:11 programmerjake

One interesting side-note: PowerPC v3.0 includes an instruction for converting float types to f80, though I think that's the only supported operation.

Turns out that the only supported f80 operation is xsrqpxp, which rounds a f128 to a f80 but leaves it in f128 format, that's useful for implementing f80 arithmetic operations, since, for all of add, sub, mul, div, and sqrt, if all inputs are known to be f80 values in f128 format, then you can produce the exact result f80 value in f128 format by:

run the add, sub, mul, div, or sqrt operation for f128 in round to odd mode
run the xsrqpxp instruction in the desired rounding mode for the f80 operation

This is similar to how f32 arithmetic can be implemented in JavaScript (which only has the f64 type for arithmetic) by rounding to f32 between every operation.

Nov 08 '20 19:11 programmerjake

[...] that's useful for implementing f80 arithmetic operations [...]

No need to, ExtendedPrecision<f64> would simply be f128 on targets that do not have a native extended double format.

In many languages computations with floating point numbers aren't guaranteed to be identical on different targets. On x86_64, for instance, doubles were/are often stored in 80-bit registers, it's only when they are written to memory that they are truncated to 64 bits. In strict mode the JVM thus has to write every floating point value back to memory between operations to guarantee identical results on different architectures.

[Edit: formulation was ambiguous.]

Nov 14 '20 14:11 eprovst

@elecprog x87 is no longer the normal case. They're stored as is in SIMD registers, x87 has been out of use for over a decade. SIMD directly operates on 64-bit and 32-bit floats.

Nov 14 '20 14:11 moonheart08

@elecprog on x86_64 both f32 and f64 are defined by the ABI to be stored in SSE registers and not in the x87 stack. on x86 32-bit they can be stored on the x87 stack.

Nov 16 '20 00:11 programmerjake

Computations are not guaranteed to be identical on different targets anyway.

This is a somewhat misleading statement, because not only does it depend on what you next say (and others have discussed its incorrectness), but in actuality the vast majority of targets and especially modern targets do give identical computations with most inputs, such that if you know what you are doing you can in fact even make exact comparisons across the vast majority of targets. Rust even allows you to easily do this because its semantics around floats are, in spite of some issues, currently fairly predictable compared to many other languages.

Nov 19 '20 00:11 workingjubilee

rfcs rfcs copied to clipboard

Suggestions for additional floating-point types

rfcs
rfcs copied to clipboard