go icon indicating copy to clipboard operation
go copied to clipboard

proposal: x/exp/slices: Allow different types for haystack/needle in BinarySearchFunc

Open Merovius opened this issue 2 years ago • 4 comments

I propose relaxing the constraints on BinarySearchFunc to allow using two different types:

func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)

As it adds a type parameter, this is not backwards compatible, but it's exp and the extra type argument is inferable. We could also delay this until, if ever, we move the slices package to the stdlib.

Rationale

A usecase this just came up for me is that I wrote a helper package for Advent of Code to deal with sparse interval sets. A set is represented as a slice of intervals [a,b), [c,d),…, [y,z) with a<b<…<y<z. Looking up an element in this set is done via a binary search for an interval containing it. Similarly, intersections and unions of these sets involve a bunch of binary searching. In all of these cases, the algorithm naturally looks up an integer, not an interval.

With the current signature, the union version would be written as

lo, lok := slices.BinarySearchFunc(s, Interval[T]{i.Min, i.Min}, func(i, j Interval[T]) int {
	if j.Min > i.Max {
		return -1
	}
	if j.Min+1 < i.Min {
		return 1
	}
	return 0
})
// vs.
lo, lok := slices.BinarySearchFunc(s, i.Min, func(i Interval[T], v T) int {
	if v > j.Max {
		return -1
	}
	if v+1 < j.Min {
		return 1
	}
	return 0
}

To me, the second version is far more clearer. It makes obvious which argument is the needle and which is the haystack element. It does not involve creating an artificial Interval, nor taking a .Min to extract the needle from said Interval again (and from the code, it's really not clear whether that's semantically meaningful).

The two type parameter version seems far more natural to me and anecdotally, I believe it would have always been the right choice for me when I looked at slices.BinarySearchFunc. And I remember not using it at least once or twice because of this (instead resorting to sort.Search). As another example, if I have a slice of Person, sorted by name, looking up the index by name would be

slices.BinarySearchFunc(s, Person{Name:name}, func(a, b Person) int { return cmp(a.Name, b.Name) })
// vs.
slices.BinarySearchFunc(s, name, func(p Person, n name) int { return cmp(a.Name, n) })

Again, this just seems clearer to me.

Merovius avatar Dec 15 '22 23:12 Merovius

I have also encountered this issue before. I agree it's annoying. Sometimes I have to create a temp element simply to make the comparison works, while all I really care is just a member value of that element.

But if backwards compatibility is not a problem, maybe changing the signature to

func BinarySearchFunc[E any](x []E, cmp func(E) int) (int, bool) 

would be enough and simpler? the target can be captured into the cmp function anyway.

David-Mao avatar Dec 16 '22 02:12 David-Mao

I suggested that in the original issue but it did not happen. Perhaps my arguments at the time where weak, though.

Merovius avatar Dec 16 '22 06:12 Merovius

I support this proposal. For the same reasons I gave in the original proposal, I'm not keen on the one-argument form, because it's not clear which way the comparison goes (and if anyone else is similar to me, it's can be a real brain bender to work out exactly which is correct), and in practice you'll always end up creating a closure.

As an example, say we have a slice of objects ordered by time. Using the more general API is nice here:

type entry struct {
	key string
	val int
}

func (e *entry) cmpKey(k string) int {
	return strings.Compare(e.key, k)
}

func findEntry(es []*entry, k string) *entry {
	i, ok := slices.BinarySearchFunc(es, k, (*entry).cmpKey)
	if ok {
		return es[i]
	}
	return nil
}

rogpeppe avatar Dec 20 '22 18:12 rogpeppe

My preference is for the closure, one argument form. In the cases where you do use a closure instead of a method, the two argument form ends up being confusing in its own way.

earthboundkid avatar Dec 20 '22 19:12 earthboundkid

I think the one argument version is the most consistent with sort.Search, but I think the two argument version gives the most flexibility by allowing for the use of non-closure arguments. I think it's clear what the arguments are and what the return value is. I am also in the boat where I'm left scratching my head about what to write inside a 1-ary cmp function, and I get sort.Search wrong often enough that I nowadays always look up an example.

kylelemons avatar Dec 24 '22 18:12 kylelemons

It sounds like people are generally in favor of

func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)

Is E, T the standard order in other languages for the comparison arguments in a call like this?

Are there any other concerns?

rsc avatar Jan 18 '23 18:01 rsc

Is E, T the standard order in other languages for the comparison arguments in a call like this?

These are the closest equivalents I can find in other languages:

Python: bisect Java: For arrays and Lists

They each take the slice-equivalent first and the target element second, like the proposed BinarySearchFunc, and presumably the cmp function should take its arguments in the same order.

aclements avatar Jan 25 '23 18:01 aclements

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc avatar Jan 25 '23 19:01 rsc

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

rsc avatar Feb 01 '23 19:02 rsc

Change https://go.dev/cl/464456 mentions this issue: slices: allow different types in BinarySearchFunc

gopherbot avatar Feb 01 '23 22:02 gopherbot

Change https://go.dev/cl/485995 mentions this issue: slices: update BinarySearchFunc docs for CL 464456

gopherbot avatar Apr 18 '23 20:04 gopherbot

Change https://go.dev/cl/498395 mentions this issue: doc/go1.21: merge x/exp/slices issue into slices package

gopherbot avatar May 25 '23 21:05 gopherbot