v icon indicating copy to clipboard operation
v copied to clipboard

Incorrect f64 -> u64 conversions

Open Varpie opened this issue 1 year ago • 5 comments

Describe the bug

For explicit conversions of an f64 to a u64, it seems like the u64 operator behaves incorrectly for values between 2^63 and 2^64-1 (the max value for a u64). Here are some examples of f64 values that should return valid u64, but fail:

Reproduction Steps

fn main() {
    assert u64(-1) == 18446744073709551615 // this one passes
    assert f64(u64(-1)) == u64(-1) // this one passes
    assert u64(f64(u64(-1))) == f64(u64(-1)) // this one fails. u64(f64(u64(-1))) gives 0
    
    assert u64(math.pow(2, 63)) == math.pow(2, 63) // this one works
    assert u64(math.pow(2, 63) + 1) != math.pow(2, 63) // this one fails, which can make sense due to low precision on the f64
    assert u64(math.pow(2, 64)) != math.pow(2, 63) // this one fails too, which doesn't make sense since 2^64 should overflow and give 0 instead of 2^63
}

Expected Behavior

The assertions above should all pass. Alternatively, when u64 is used with a type that is too large, it should give an error, so we can handle it properly.

Current Behavior

The assertions commented as failing give a FAIL, with *unknown value* for the u64 casts. For instance:

fn main.main: assert u64(math.pow(2, 63) + 1) != math.pow(2, 63)
   left value: u64(math.pow(2, 63) + 1) = *unknown value*
  right value: math.pow(2, 63) = 9.223372036854776e+18

But when evaluating just the side with u64, it gives a concrete value: u64(math.pow(2, 63) + 1) is 9223372036854775808 (and so is u64(math.pow(2, 63)))

Possible Solution

Potentially related to #14783

Maybe also related: assertions of large f64 with small differences fail (although this is likely just because the IEEE754 binary64 format is only precise up to 2^53 for integers):

assert math.pow(2, 64) != math.pow(2, 64) - 1000 // fails, gives both sides as 1.8446744073709552e+19

Additional Information/Context

No response

V version

V 0.4.4 065e5e2

Environment details (OS name and version, etc.)

V full version: V 0.4.4 a374d25.065e5e2 OS: linux, "Arch Linux" Processor: 8 cpus, 64bit, little endian, Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

vroot: OK, value: ~/[...]/v VMODULES: OK, value: ~/.vmodules VTMP: OK, value: /tmp/v_1000

Git version: git version 2.44.0 Git vroot status: weekly.2024.08-35-g065e5e22-dirty .git/config present: true

CC version: cc (GCC) 13.2.1 20230801 thirdparty/tcc status: thirdparty-linux-amd64 99683af0

[!NOTE] You can use the 👍 reaction to increase the issue's priority for developers.

Please note that only the 👍 reaction to the issue itself counts as a vote. Other reactions and those to comments will not be taken into account.

Varpie avatar Mar 03 '24 01:03 Varpie

The generated C code as follow:

	// assert1: assert u64(-1) == 18446744073709551615 // this one passes
	if (!(((u64)(-1)) == 18446744073709551615U)) {}
    
	// assert2: assert f64(u64(-1)) == u64(-1) // this one passes
	if (!(((f64)(((u64)(-1)))) == ((u64)(-1)))) {}
    
	// assert3: assert u64(f64(u64(-1))) == f64(u64(-1)) // this one fails. u64(f64(u64(-1))) gives 0
    if (!(((u64)(((f64)(((u64)(-1)))))) == ((f64)(((u64)(-1)))))) {}
	
    // assert4: assert u64(math.pow(2, 63)) == math.pow(2, 63) // this one works
	f64 _t4 = math__pow(2, 63);
	if (!(((u64)(math__pow(2, 63))) == _t4)) {}
    
	// assert5: assert u64(math.pow(2, 63) + 1) != math.pow(2, 63) // this one fails, which can make sense due to low precision on the f64
    f64 _t6 = math__pow(2, 63);
	if (!(((u64)((f64)(math__pow(2, 63) + 1))) != _t6)) {}
	
	// assert6: assert u64(math.pow(2, 64)) != math.pow(2, 63)
	f64 _t8 = math__pow(2, 63);
	if (!(((u64)(math__pow(2, 64))) != _t8)) {}

compile with different compilers:

v . 
v . -cc gcc
v . -cc clang

As for different C compilers, some assertions does not pass,

  1. tcc fail at assert3, assert5, assert6
  2. gcc fail at assert5,
  3. clang fail at assert3, assert5, assert6

According to this result, I think there are some undefined actions of C lang, so different compiler handle these different.

kbkpbot avatar Mar 03 '24 07:03 kbkpbot

assert f64(u64(-1)) == u64(-1) // this one passes

That is a problem. Implicit cast between int and float should not be allowed because u64 max overflows the 15 decimals that float 64-bit can represent accurately.

Edit: My point is this: This statement should not compile.

I am fine with:

assert u64(f64(u64(-1))) == u64(-1) // expected to fail

hholst80 avatar Mar 05 '24 10:03 hholst80

@hholst80 the cast overflows the 15 decimals that f64 can represent accurately, but the max value of f64 is much higher than the max value of u64, so it doesn't overflow the range of acceptable values. So I don't think it should fail by default, but maybe a compiler warning would be a better way to handle it, in addition to a safe cast method. The problem is that casting u32 to f64 is safe, and u32 to u64 is also safe, so it can be difficult to determine whether the cast is safe at compile time.

Varpie avatar Mar 05 '24 17:03 Varpie

I did a poll a few years back "what is the worst feature of C++" and the implicit cast came up as the absolute worst.

The only case I can come to terms with is a numeric literal. As long as it can be promoted without loss of precision as-is, I can accept it as useful shorthand notation. Eg. assert f64(1) == 1.0 should be ok. Rationale: 1 can be cast to a 1 as f64 and 1.0 literal can be interpreted as a f64 without loss of precision. The only difference between cast and the "interpretation" should be the flexibility of scope and target container for the source number.

Relying on "whatever C does" is closer to the Absolute worst than it's close to Optimal. C does not do much if anything worth copying for numerically safe computations. Given that C is the intermittent state there has to be compile time guards against insecure code that can break the user code in unexpected ways.

hholst80 avatar Mar 05 '24 22:03 hholst80

The second part of this discussion would be on the expected behavior of an explicit cast from f64 to u64. What should the value be if say the value of the f64 cannot be represented as an u64? C probably pulls it's usual "oh it's undefined and vendor specific". That is not a good platform for safe computations. Safe should be the default with an optional escape hatch like unsafe { ... } or similar.

Note that no floats can generally be represented as integers no matter the size due to signed zero, nan's, and inf's. A stdlib conversion function that returns a Result or unsafe { .. } feels like a step up from C.

Side note: Fortran gave some more careful though and offers four ways to cast a float to an int. They talk about just the regular normal floats.This is hairy stuff when left to intuition- users are going to be surprised in a bad way. https://stackoverflow.com/questions/40310299/convert-a-real-output-into-integer#40310677

hholst80 avatar Mar 05 '24 23:03 hholst80