Allow max/min to take a proc or `it` argument
Abstract
I propose to allow the max and min procs to take an additional argument, which specifies the criterion by which the comparisons are made. This could either be a single-argument procedure, or an it expression (similar to mapIt and friends).
Motivation
It is fairly common to want to get the maximum and minimum of a sequence of T, but by a different criterion than the usual < defined for T (or where < isn't even defined for T). Examples:
- Get the string with the longest length
- Get the number with the largest absolute value
- Get the vector with the largest magnitude
- Get the file in a directory with the most recent modified time
The algorithm module in the standard library already provides an analogous functionality for the sort family of functions: you can provide a cmp argument, or use sortedByIt. I suppose a third option could be to allow max and min to take a cmp function, although personally I find the cmp style to be fiddly in this context.
Other languages (e.g. Python) also allow you to pass a function to max and min.
Description
The max implemention in proc style:
proc max[T, U](x: openArray[T], op: proc(t: T): U): T =
var currentMax = op(x[0])
result = x[0]
for i in 1..high(x):
let criterion = op(x[i])
if currentMax < criterion:
currentMax = criterion
result = x[i]
In it style:
# untested, adapted from sortedByIt implementation
func max[T](x: open_array[T], cmp: proc(x, y: T): int {.closure.}): T =
result = x[0]
for i in 1..high(x):
if cmp(x[i], result) > 0:
result = x[i]
template maxIt(x, op: untyped): untyped =
var result = max(x, proc(x, y: typeof(items(x), typeOfIter)): int =
var it {.inject.} = x
let a = op
it = y
let b = op
result = cmp(a, b))
result
proc or it?
I'm not sure whether the criterion should be specified as a proc, or in it style, or whether to have both coexisting like is done with map and mapIt. Personally I prefer the proc style, but either or both are fine and have precedent. Thoughts?
Examples
Procedure way:
import sugar, sequtils
let raven = @["Once", "upon", "a", "midnight", "dreary"]
echo raven.max(x => x.len) # "midnight"
echo raven.min(x => x.len) # "a"
it way:
echo raven.maxIt(it.len) # "midnight"
echo raven.minIt(it.len) # "a"
Also the two-argument forms (i.e. not on an openArray) would allow for a criterion argument:
max(-10, 4, x => x.abs) # -10
min(-10, 4, x => x.abs) # 4
Other Things
While I'm on the subject, I thought of some other existing procs that might benefit from taking a criterion argument, and some new procs that could be added to sequtils on the same theme.
The maxIndex and minIndex procs in sequtils are the obvious next step (it versions omitted for brevity):
echo raven.maxIndex(x => x.len) # 3
echo raven.minIndex(x => x.len) # 2
(On an unrelated note, could we also add create argmax and argmin as aliases for maxIndex and minIndex? I thought the stdlib didn't have them for ages, until I noticed maxIndex and minIndex by accident while reading the sequtils docs.)
The find function could have variants which take a predicate argument:
echo raven.find(x => x.len < 4) # 2
It also seems useful to have first and last procs in sequtils:
echo raven.first # "Once"
echo raven.first(x => x.len > 4) # "midnight"
echo raven.last # "dreary"
echo raven.last(x => x.len == 4) # "upon"
(The return type should probably be Option[T] because the predicate might never be satisfied.)
Backward Incompatibility
This should not cause any backward-incompatible changes, since you would still be able to use max and min in the same way as before just by not passing in the criterion argument. These would be defined as additional versions of max and min, with different argument signatures, not modifying the existing definitions.
As long as these additions are not done to system.nim, I have no real objections. However, here is an idea: Instead of adding more and more things to sequtils and algorithm, let's have algorithms in their own modules:
sort.nim, map.nim, fold.nim, minmax.nim, ...
(Related to Nim v2.)
you can already do
type AbsInt = distinct int
proc `<=`(a, b: AbsInt): bool = abs(int(a)) <= abs(int(b))
echo int(min(AbsInt(-10), AbsInt(-12)))
I like this very much. Recently while just playing around with something I did
data.sortedByIt(criterion it)[0] (when I needed just data.minByIt(criterion it)) because I was lazy.
maxIt shouldn't create a closure, it should inline everything, that's the main draw of mapIt/foldIt and friends.
Regarding Araq proposal I would classify like this:
- higher order functions (map, fold)
- aggregate functions (min, max, sum, argmin, argmax, prod, ...)
- ...
I want this.
I forgot I made this RFC because I haven't used Nim for a while. But I see that Nim 2 is out now so I'll take another look at it and try to make a PR for these functions soon(tm).
you can already do
type AbsInt = distinct int proc `<=`(a, b: AbsInt): bool = abs(int(a)) <= abs(int(b)) echo int(min(AbsInt(-10), AbsInt(-12)))
Yeah I guess, but having to define a new type and a < proc is a hassle. It's overkill when all I usually want is to make an ad-hoc / one-off criterion. Why do it in three lines when you could do it in one?