rfcs
rfcs copied to clipboard
Suggestions for additional floating-point types
I noticed that, like other languages, the only floating-point types built-in are f32
and f64
. However, I often have limitations with just these. I propose the following: ~~fsize
, freal
~~, and f128
~~fsize
would be like isize
but for floats. Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64
.~~
~~Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:~~
~~#ifdef REAL_T_IS_DOUBLE
~~
~~typedef double real_t;
~~
~~#else
~~
~~typedef float real_t;
~~
~~#endif
~~
~~I propose something similar in Rust, where you can just write freal
or something and be able to change the precision later with compiler flags. The default would probably be f32
.~~
Finally, it would be nice to have 128-bit floats (f128
) in the language. These are not normally needed, but there are use cases for them, and it'd be nice to have it as a language built-in type. Some newer processors have 512-bit SIMD chipsets that can process these efficiently, though most don't.
If you only implement some of these proposals, that's fine too. Originally posted at https://github.com/rust-lang/rust/issues/57928
fsize
would be likeisize
but for floats. Basically, use the version that's most efficient for your processor.
isize
is not the integer type that's most efficient for your processors - it's the integer type that's the same size as a pointer. It's like ptrdiff_t
, not int
.
I propose something similar in Rust, where you can just write
freal
or something and be able to change the precision later with compiler flags. The default would probably bef32
.
#[cfg(feature = "real_t_is_double")]
type real_t = f64;
#[cfg(not(feature = "real_t_is_double")]
type real_t = f32;
A better suggestion would be f16 support, as it is common in graphics.
@moonheart08
Are f16 used much in intermediate calculations? I know it is used commonly as a storage format, but that last time I checked this (I wrote a Pre-RFC on this on internals a while back, but I'm a bit fuzzy on the details), a lot of the calculations involving f16 on most platforms is done by casting to f32, performing the op, the cast back to f16. If that is the case then having native f16 support may not be that important.
Adding the ability to use the F16C
instructions may be useful to have in core::arch
though, perhaps something like __m128h
which has 8 "f16"s.
How about long double
and 128-bit floats? ~I could be wrong, but I'm 99% sure that we currently unavoidably lose precision when using long double
s from C. On my computer (macOS), bindgen
outputs f64
, but sizeof(long double)
in C outputs 16 bytes. (128 bits; for alignment I guess?).~
~(On a side note, is that even safe behavior? What about C functions that take long double *
?)~
@Coder-256 In C++, long double
is 64-bit on Windows, 80-bit in MinGW, and 128-bit on Mac and Linux (probably indeed for alignment, as I don't think anyone implements it as quadruple precision).
@aaronfranke Could you please clarify what you mean? What I was trying to say is that Rust currently does not have any support for floats larger than 64 bits (8 bytes), for example, long double
on certain platforms. I was also trying to point out that in addition to having limited precision within Rust code, this makes it difficult to interact with native code that uses large floats, such as using FFI with C code that uses floats larger than 64 bits.
There was also a separate issue with bindgen
that caused float sizes to be incorrect for large floats, but that has been fixed (in rust-lang/rust-bindgen@ed6e1bbec439e8b260e6e701379fc70d295f35fe).
I wasn't disagreeing with you, I was just adding information. Sorry if I wasn't clear. f128
would be great.
@aaronfranke I absolutely agree, both f128 and f80 would be very useful, especially for FFI (for example, Swift already has Float80
mainly for communicating with old C code, just an example to show how it could help)
old things never be gone, I wanna push this. rust is a system language not a script language, need compat old things.
I wanna push add support for fp80 and fp128... any help need?
Like https://github.com/rust-lang/rust/pull/38482 does
Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64.
Even on modern x86 which has similar or equal speed between most f32
and f64
ops, f32
is still very much the fastest for your processor because it cuts cache misses in half.
Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:
#[cfg(real_is_f64)]
type real = f64;
#[cfg(not(real_is_f64))]
type real = f32;
then you can control via RUSTFLAGS="--cfg real_is_f64"
(you can also use cargo features, but they're not a great fit for cases where enabling a feature can cause compile errors like this)
... Regarding suggestions of f80
What would f80
do on platforms that aren't x86? Noting else has native 80bit floats. It's not even part of IEEE754 (even though it's largely natural extension of it... although it has a lot of quirks). This is something that would be viable in core::arch::{x86,x86_64}
but isn't portable. We don't want to have to implement these as software floats on other platforms.
I'd be in favor of a std::os::raw::c_long_double
type but it would have to be carefully designed. Note that PPC's long double is exceptionally cursed, as it's a pair of doubles that are summed together...
I'd be in favor of f16, and tentatively f128 since binary128
is part of IEEE754 2019, at least.
EDIT: I hadn't noticed that sfalker said the exact same thing as my first point >_>
Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64.
Even on modern x86 which has similar or equal speed between most
f32
andf64
ops,f32
is still very much the fastest for your processor because it cuts cache misses in half.Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:
#[cfg(real_is_f64)] type real = f64; #[cfg(not(real_is_f64))] type real = f32;
then you can control via
RUSTFLAGS="--cfg real_is_f64"
(you can also use cargo features, but they're not a great fit for cases where enabling a feature can cause compile errors like this)... Regarding suggestions of f80
What would
f80
do on platforms that aren't x86? Noting else has native 80bit floats. It's not even part of IEEE754 (even though it's largely natural extension of it... although it has a lot of quirks). This is something that would be viable incore::arch::{x86,x86_64}
but isn't portable. We don't want to have to implement these as software floats on other platforms.I'd be in favor of a
std::os::raw::c_long_double
type but it would have to be carefully designed. Note that PPC's long double is exceptionally cursed, as it's a pair of doubles that are summed together...I'd be in favor of f16, and tentatively f128 since
binary128
is part of IEEE754 2019, at least.
We have a fact that f80 are broadly used, and in forseable future, that's will continue. We have no need a soft f80 impl, just make on x86 platfrom f80 works is enough. Anyway a soft f80 may be a better option for cross platform consideration.
several architectures have hardware support for f128
: RISC-V, PowerPC, s390, and probably more.
several architectures have hardware support for
f128
: RISC-V, PowerPC, s390, and probably more.
For platform have f128, implmenet f80 would not cause significant performance down
@thomcc These are all ideas, not everything in the OP is relevant anymore since it has been discussed. I think fsize
and freal
have been discussed and dismissed, fsize
is a bad idea considering the information in this thread and freal
is indeed easy to implement with a small amount of lines of code so it doesn't need to be in the language.
That said, f128
is still desired for sure and has some use cases and some hardware support, f80
would be neat though I wouldn't use it personally, f16
would be useful especially in the context of low-end graphics though I also wouldn't use this myself, and if your goal is to cover IEEE 754 there is also f256
or octuple precision, though it's rare to see.
@thomcc These are all ideas, not everything in the OP is relevant anymore since it has been discussed. I think
fsize
andfreal
have been discussed and dismissed,fsize
is a bad idea considering the information in this thread andfreal
is indeed easy to implement with a small amount of lines of code so it doesn't need to be in the language.That said,
f128
is still desired for sure and has some use cases and some hardware support,f80
would be neat though I wouldn't use it personally,f16
would be useful especially in the context of low-end graphics though I also wouldn't use this myself, and if your goal is to cover IEEE 754 there is alsof256
or octuple precision, though it's rare to see.
may be we can add f16 f80 and f128 in a single shot?
f16 has uses in neural networks as well.
There are actually many problems with using f80, especially if we do not ship a soft float to cover it... it would not be a type defined by an abstraction, frankly, it would be a type defined by Intel's hardware quirks, and we would only be adding more on top of it. One of the nice things about Rust is that it is highly portable right now, so I do not think it makes sense to add such a non-portable type to the language and limit portability that much, though a language extension that makes it simpler to define and use such a non-portable type would make sense.
several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.
I can't say for sure about the other arches, but PowerPC's is not IEEE-754-like at all — it's double-double. It would not help for implementing a sane f128 nor would it help implement a f80.
For platform have f128, implmenet f80 would not cause significant performance down
I don't think this is really true (we can quibble over significant, I guess), but regardless rust doesn't exclusively target architectures in the sets {have native f80}
, {have native f128}
, so something that solves this for other architectures needs to be considered.
if your goal is to cover IEEE 754 there is also f256 or octuple precision, though it's rare to see.
I mean, it's not mentioned in IEEE-754 2019. It's not hard to imagine what it looks like, admittedly.
Anyway, I think once inline asm is stable someone who really wants f80 could implement it as a library on x86/x86_64. This wouldn't solve the issue of FFI (e.g. a c_long_double
type), which I still think would be nice to solve, but I think has a lot of different design considerations, could just be a mostly-opaque type that includes little more than implementations of From<f64>
/Into<f64>
(e.g. no arithmetic).
@thomcc
several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.
I can't say for sure about the other arches, but PowerPC's is not IEEE-754-like at all — it's double-double. It would not help for implementing a sane f128 nor would it help implement a f80.
You're thinking of C's long double
type; PowerPC does support IEEE-754 standard binary128
FP using new instructions added in Power ISA v3.0.
Quoting GCC 6's change log:
PowerPC64 now supports IEEE 128-bit floating-point using the __float128 data type. In GCC 6, this is not enabled by default, but you can enable it with
-mfloat128
. The IEEE 128-bit floating-point support requires the use of the VSX instruction set. IEEE 128-bit floating-point values are passed and returned as a single vector value. The software emulator for IEEE 128-bit floating-point support is only built on PowerPC GNU/Linux systems where the default CPU is at least power7. On future ISA 3.0 systems (POWER 9 and later), you will be able to use the-mfloat128-hardware
option to use the ISA 3.0 instructions that support IEEE 128-bit floating-point. An additional type (__ibm128) has been added to refer to the IBM extended double type that normally implementslong double
. This will allow for a future transition to implementinglong double
with IEEE 128-bit floating-point.
Thanks, you're correct that I was thinking of the PPC long double (__ibm128) type. Unfortunately, I think the existence of 2 separate 128-bit "floating point" types on powerpc only complicates things, although it's nice that at least one of them is moderately sane.
Full(er) support for IEEE 754 would indeed be very welcome, especially for numerical work.
What would
f80
do on platforms that aren't x86? Noting else has native 80 bit floats. It's not even part of IEEE 754 (even though it's largely natural extension of it... although it has a lot of quirks).
This is somewhat false, x86's 80-bit floats are extended precision binary64's as specified by IEEE 754.
However it's true that these are not very strictly defined, an extended precision binary64 has to have a larger precision than binary64 and the exponent range of binary128. This means that both x86's 80-bit floats and binary128 are examples of valid extended precision binary64's.
I'd suggest providing the following types:
f16
(binary16), f32
(binary32), f64
(binary64), f64e
(binary64 extended) and f128
(binary128).
On x86 platforms, and others that have a native extended precision binary64, a f64e
would be an 80-bit float or similar, on all others it would be the same as a f128
.
[Edit: further clarified in the relation between 80-bits floats and IEEE 754.]
So, on the other side of "portable" is "layout". We have a lot of ambiguous-layout types which are not primitive types. However, as far as I am aware all the primitive types have a pretty explicit layout, and many of the std composite data types like Vec<T> etc. have most of their layout dialed in as well. Here we'd have two possible layouts on a numeric type which should be as simple as possible, andf64e
is probably the wrong abstraction here because there's a lot of cases where someone wants "type N that fulfills X or else type M that fulfills a superset of X", especially for math libs.
I'm not too sure what you mean by 'layout' in this case, it's true that extended precision floats do not have to conform to a certain bit format. If you refer to the memory layout of complex data types, I'm not sure if there are any guarantees here anyway as I wouldn't be surprised optimisation passes can and do change these kinds of layouts.
I didn't give much thought to the syntax of f64e
, something like ExtendedPrecision<f64>
might indeed be the better choice here, which also neatly extends to the other fxx
's.
Most do seem to agree on including all the common IEEE 754 types, which is, I think, the main goal of this issue. Something similar to Fortran's selected_real/integer_kind
could also be looked at, but should probably be moved to another issue.
I'd have to check Rust's current support for other parts of IEEE 754 first. There are very few languages with good support for the hardware's capabilities in this area and those that do tend to be rather unsafe. Numerical analysis and other scientific computing do seem to be a great fit for Rust, so I think it's worth looking into this.
[Edit: typos and clarification]
I would expect f64e
to be directly equivalent in bit representation, ABI, and layout to C/C++'s long double
except in cases like MSVC on x86_64 where they pick long double
== double
even though f80
is still usable from a hardware level. There would be another type alias c_long_double
for exact equivalence to long double
on all platforms with an ABI-compatible C compiler and when the long double
type is supported by Rust (so, probably excluding PowerPC's annoying double-double type for the MVP).
One interesting side-note: PowerPC v3.0 includes an instruction for converting float types to f80
, though I think that's the only supported operation.
f128
would be directly equivalent to gcc/clang's __float128
type where supported.
One interesting side-note: PowerPC v3.0 includes an instruction for converting float types to
f80
, though I think that's the only supported operation.
Turns out that the only supported f80
operation is xsrqpxp
, which rounds a f128
to a f80
but leaves it in f128
format, that's useful for implementing f80
arithmetic operations, since, for all of add, sub, mul, div, and sqrt, if all inputs are known to be f80
values in f128
format, then you can produce the exact result f80
value in f128
format by:
- run the add, sub, mul, div, or sqrt operation for
f128
in round to odd mode - run the
xsrqpxp
instruction in the desired rounding mode for thef80
operation
This is similar to how f32
arithmetic can be implemented in JavaScript (which only has the f64
type for arithmetic) by rounding to f32
between every operation.
[...] that's useful for implementing
f80
arithmetic operations [...]
No need to, ExtendedPrecision<f64>
would simply be f128
on targets that do not have a native extended double format.
In many languages computations with floating point numbers aren't guaranteed to be identical on different targets. On x86_64, for instance, doubles were/are often stored in 80-bit registers, it's only when they are written to memory that they are truncated to 64 bits. In strict mode the JVM thus has to write every floating point value back to memory between operations to guarantee identical results on different architectures.
[Edit: formulation was ambiguous.]
@elecprog x87 is no longer the normal case. They're stored as is in SIMD registers, x87 has been out of use for over a decade. SIMD directly operates on 64-bit and 32-bit floats.
@elecprog on x86_64 both f32
and f64
are defined by the ABI to be stored in SSE registers and not in the x87 stack. on x86 32-bit they can be stored on the x87 stack.
Computations are not guaranteed to be identical on different targets anyway.
This is a somewhat misleading statement, because not only does it depend on what you next say (and others have discussed its incorrectness), but in actuality the vast majority of targets and especially modern targets do give identical computations with most inputs, such that if you know what you are doing you can in fact even make exact comparisons across the vast majority of targets. Rust even allows you to easily do this because its semantics around floats are, in spite of some issues, currently fairly predictable compared to many other languages.