go icon indicating copy to clipboard operation
go copied to clipboard

math: add Compare and Compare32

Open zhangyunhao116 opened this issue 2 years ago • 30 comments

The slices.Sort doesn't support sorting slices of floating-point numbers containing Not-a-number (NaN) values, It's not very intuitive.

Once the slices package is merged into stdlib, the sort.{TYPE}s(e.g. sort.Ints, sort.Float64s) APIs may be semi-deprecated. But the slices package doesn't provide a simple way to sort slices of floats containing NaNs, it recommends using slices.SortFunc(x, func(a, b float64) bool {return a < b || (math.IsNaN(a) && !math.IsNaN(b))}), this function is very complex compared to using sort.Float64s.

Related CL: https://go-review.googlesource.com/c/exp/+/446155

zhangyunhao116 avatar Oct 31 '22 04:10 zhangyunhao116

Change https://go.dev/cl/446155 mentions this issue: slices: Sort support sorting slices of floating-point numbers containing NaN values

gopherbot avatar Oct 31 '22 06:10 gopherbot

This is related to #33440. @smasher164 mentions the existence of a total-ordering predicate that could be used to sort floats with NaNs.

dsnet avatar Oct 31 '22 07:10 dsnet

CC @eliben @randall77

ianlancetaylor avatar Oct 31 '22 23:10 ianlancetaylor

Once the slices package is merged into stdlib, the sort.{TYPE}s(e.g. sort.Ints, sort.Float64s) APIs may be semi-deprecated.

I don't think they will, using sort.Float64s will be fine forever.

I think the only advantage of slices.Sort is that it will work on slices of named float64s (e.g. type F float64; a := []F{5,4,3}; slices.Sort(a), that won't work using sort.Float64s). And in that rare case one can use the described workaround.

randall77 avatar Oct 31 '22 23:10 randall77

Since Go1.18 is released, many users use the slices.Sort to replace sort.{TYPE}s, because the former is faster and more clear in almost all cases. From the point of view of the users, the former looks more like a replacement of the latter, they have almost the same function in fact except the slices.Sort doesn't support sorting slices containing NaNs.

The main concern is that users don't notice this special case, and the replacement may break their codes.

zhangyunhao116 avatar Nov 01 '22 06:11 zhangyunhao116

I think it's worth adding documentation to slices.Sort that sorting a float won't work. Maybe also there could be a comparator function in sort.

earthboundkid avatar Nov 02 '22 14:11 earthboundkid

I think it's worth adding documentation to slices.Sort that sorting a float won't work. Maybe also there could be a comparator function in sort.

The documentation for Sort is currently:

Sort sorts a slice of any ordered type in ascending order. Sort may fail to sort correctly when sorting slices of floating-point numbers containing Not-a-number (NaN) values. Use slices.SortFunc(x, func(a, b float64) bool {return a < b || (math.IsNaN(a) && !math.IsNaN(b))}) instead if the input may contain NaNs.

Is this not sufficient documentation, @carlmjohnson ?

eliben avatar Nov 02 '22 16:11 eliben

@zhangyunhao116 I'm not a big fan of this proposal at this time. The problem with sorting floats is clearly documented and a workaround is provided. It's easy to sort using SortFunc

eliben avatar Nov 02 '22 16:11 eliben

That seems good to me.

In #50340 I think there was talk about adding sort.Compare which would take a comparable and return -1, 0, 1. If something like that is ever added there should also be CompareFloat, but it's not necessary on its own.

earthboundkid avatar Nov 02 '22 16:11 earthboundkid

The problem with sorting floats is clearly documented and a workaround is provided. It's easy to sort using SortFunc

Agree that the special case is clearly documented and has a workaround, but it's nice to support all the special cases as sort.{TYPE}s does :)

I did a simple poll about slices.Sort at my work. The question is

The slices.Sort sorts a slice of any ordered type in ascending order. Intuitively, you think the function is

A. A replacement of sort.{TYPE}s APIs, has the same output, and support all special cases. B. slices.Sort and sort.{TYPE}s APIs exist side by side, and they may have different outputs in special cases.

51 people joined it, 82.4% (42) selected A, and 17.6% (9) selected B. Looks like many users of this API doesn't notice the special case.

zhangyunhao116 avatar Nov 04 '22 08:11 zhangyunhao116

The proposal still need more discussions, it's a rare case, and It's merely not as users intuitively expected. The slices.Sort work very well for all the general cases.

zhangyunhao116 avatar Nov 04 '22 09:11 zhangyunhao116

Special casing the generic type being float64 would solve sorting []float64, but it wouldn't solve sorting []struct {x float64}.

I don't see how we can special case this reasonably. Callers should not sort slices with NaNs in them.

rsc avatar Nov 09 '22 18:11 rsc

What if instead, we provided a function in the math package like:

package math

// FloatCompare compares x to y according to the
// total-ordering predicate in IEEE-754, section 5.10.
// The result will be 0 if a == b, -1 if a < b, and +1 if a > b.
func FloatCompare[F constraints.Float](a, b F) int

// FloatLess reports whether a is less than b according to the
// total-ordering predicate in IEEE-754, section 5.10.
func FloatLess[F constraints.Float](a, b F) bool {
    return FloatCompare(a, b) < 0
}

Thus, usage with the slices package can be something like:

var fs []MyFloat32
slices.SortFunc(fs, math.FloatLess[MyFloat32])

This would solve both the NaN issue from this issue and the -0 vs +0 issue from #33440.

More complicated composite cases like https://github.com/golang/go/issues/56491#issuecomment-1309218085 can be handled by passing a user-defined less function to slices.SortFunc that's implemented in terms of math.FloatLess.

dsnet avatar Nov 09 '22 19:11 dsnet

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc avatar Nov 09 '22 19:11 rsc

What if instead, we provided a function in the math package like:

Should FloatCompare go in math or sort? I can see the case for either.

ISTM, if we have that we should have Compare[Int ~int|~uint|etc](a, b N) int in the same package too.

earthboundkid avatar Nov 09 '22 20:11 earthboundkid

My argument for it being in "math" is:

  1. The other scalar kinds (int, uint) don't really have ambiguous definitions of ordering (i.e., the < operator does exactly what you want). It's only really float that is special.

  2. The "math" package seems to already have special handling for IEEE-754. I proposed the names with a Float prefix to match the existing FloatXXbits and FloatXXfrombits functions that are IEEE-754 specific. There are other IEEE-754 specific functions like IsNaN and NaN.

  3. I don't think a predicate for ordering is sufficient justification for it being "sort". For example, should we have moved

    • bytes.Compare to be sort.BytesCompare, or
    • netip.Addr.Compare to be sort.AddrCompare, or
    • time.Time.Compare to be sort.TimeCompare?

    Functionality for order should live with other functionality for the type (i.e., "math" for floats, "bytes" for []byte, "strings" for strings, "netip" for netip.Addr, "time" for time.Time, etc.).

dsnet avatar Nov 09 '22 20:11 dsnet

That makes sense to me. My only other thought is that math.Compare(a, b float64) int would also make sense since all the functions in math take float64.

earthboundkid avatar Nov 09 '22 21:11 earthboundkid

What if instead, we provided a function in the math package like:

This total-ordering predicate would be pretty simple to define as well, since it's just:

func sign(a, b int64) int {
	if a < b {
		return -1
	}
	if a > b {
		return 1
	}
	return 0
}

func FloatCompare[F constraints.Float](a, b F) int {
	x := int64(Float64bits(float64(a)))
	y := int64(Float64bits(float64(b)))
	x ^= int64(uint64(x>>63) >> 1)
	y ^= int64(uint64(y>>63) >> 1)
	return sign(x, y)
}

The only thing I'd point out is this ordering of NaNs is different from the way sort.Float64s does it. The total ordering predicate sorts -NaN < everything else < +NaN, while sort.Float64s does (+-)NaN < everything.

smasher164 avatar Nov 10 '22 03:11 smasher164

The special case in sort seems like a clear decline since it can't be done generally. Retitling to be about this math.FloatCompare that might be helpful for people writing their own.

rsc avatar Nov 16 '22 18:11 rsc

I feel slightly better about this because it is defined by IEEE-754 and we are not making it up. That said, it still seems very niche. The definition is in https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf on page 40.

I don't think we should make this function the first generic one in math. It should probably just be func Compare(x, y float64) int.

rsc avatar Nov 16 '22 18:11 rsc

My reason for proposing a generic version was because you can't roundtrip convert a float32->float64->float32 and preserve the exact bits for NaNs. A single bit is mangled in the float32->float64 conversion.

See this example: https://go.dev/play/p/sPn-yfpd-nK, where one of the bits gets mangled on my Ryzen 5900x. I don't know if this a Go-specific issue or a x86 issue.

This means that:

var x, y float32 = ...
Compare(float64(x), float64(y))

wouldn't be quite correct for NaNs. Arguably this case is esoteric, but Compare is already esoteric, so the people who use it probably care about correctness in the esoteric cases.

Options:

  • Accept this oddity and just provide the float64-only API.
  • Fix the float32->float64 conversion issue in the compiler (if possible?)
  • Provide both Compare32 and Compare64 or something.

I don't have a strong preference.

dsnet avatar Nov 16 '22 19:11 dsnet

My feeling is to just provide the float64 API under the name math.Compare, and wait until math/v2 to provide a generic version along with generic everything else.

earthboundkid avatar Nov 16 '22 20:11 earthboundkid

See this example: https://go.dev/play/p/sPn-yfpd-nK, where one of the bits gets mangled on my Ryzen 5900x. I don't know if this a Go-specific issue or a x86 issue.

This is squashing a signaling NaN to a quiet NaN. I believe all platforms do this on a float32->float64->float32 conversion. (Joe and I have been through this ugliness before, e.g., #27193 and #36400.)

I'd be ok with a Compare and Compare32. (Following the lead of Nextafter and Nextafter32.)

randall77 avatar Nov 16 '22 21:11 randall77

Writing this function generically without converting to float64 first would require the ability to either call Float32bits or Float64bits depending on the type of the argument, ~~which doesn't seem possible without converting to interface{} and type-asserting. I think either Compare32/Compare64 or Compare(x, y float64) is fine.~~

Edit: I forgot that you can use unsafe and get the float value's size.

func sign[I constraints.Signed](a, b I) int {
	if a < b {
		return -1
	}
	if a > b {
		return 1
	}
	return 0
}

func FloatCompare[F constraints.Float](a, b F) int {
	if unsafe.Sizeof(a) == 4 {
		x := int32(math.Float32bits(float32(a)))
		y := int32(math.Float32bits(float32(b)))
		x ^= int32(uint32(x>>31) >> 1)
		y ^= int32(uint32(y>>31) >> 1)
		return sign(x, y)
	} else {
		x := int64(math.Float64bits(float64(a)))
		y := int64(math.Float64bits(float64(b)))
		x ^= int64(uint64(x>>63) >> 1)
		y ^= int64(uint64(y>>63) >> 1)
		return sign(x, y)
	}
}

smasher164 avatar Nov 17 '22 05:11 smasher164

Compare and Compare32 sounds fine. It looks like maybe float32->float64 does preserve signaling and float64->float32 is what squashes it, in which case plain Compare would technically be OK. But maybe other chips are different.

rsc avatar Nov 30 '22 18:11 rsc

It sounds like the API is:

func Compare(x, y float64) int func Compare32(x, y float32) int

and both functions return -1 when x compares before y, 0 when they compare the same, and +1 when x compares after y. The comparison is from PDF page 40 of https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf which puts negative NaNs before negative numbers and positive NaNs after positive numbers.

It looks like that comparison is exactly a bitwise comparison like int64(math.Float64bits(x)) compared with int64(math.Float64bits(y))

Can someone confirm that?

rsc avatar Dec 07 '22 18:12 rsc

@rsc

int64(math.Float64bits(x)) compared with int64(math.Float64bits(y))

It's not exactly a bitwise comparison. When x and y are both negative,

cmp(int64(math.Float64bits(x)), int64(math.Float64bits(y)))

gives you an incorrect answer. For example, int64(math.Float64bits(-3)) < int64(math.Float64bits(-2)) is false.

What's needed is a check for the sign bits of x and y. If they're both negative, flip the comparison. So something like

x := int64(math.Float64bits(float64(a)))
y := int64(math.Float64bits(float64(b)))
if math.Signbit(a) && math.Signbit(b) {
	return sign(cmp(y, x))
}
return sign(cmp(x, y))

This can be done branchlessly as well of course, by toggling all the bits based on the sign bit.

smasher164 avatar Dec 09 '22 10:12 smasher164

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc avatar Dec 14 '22 19:12 rsc

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

rsc avatar Dec 21 '22 19:12 rsc

Change https://go.dev/cl/459435 mentions this issue: math: add Compare and Compare32

gopherbot avatar Dec 24 '22 14:12 gopherbot