Add argmax,argmin equivalents to FSharp.Core
NumPy has argmax and argmin to return the index of the maximum and minimum values in a collection.
I propose we add something similar to F#, either with these names of indexOfMax and indexOfMin
The existing way of approaching this problem in F# is a little painful. You have to find the max and then find the specific element with that max.
module List =
let indexOfMax xs =
let mx = List.max xs
xs |> List.findIndex (fun v -> v = mx)
Not that bad but seems reasonable to have this in the core library, optimized appropriately and inlined to avoid the generic comparisons.
The functions occurs in basic machine learning samples like https://www.tensorflow.org/tutorials/keras/basic_classification
Pros and Cons
The advantages of making this adjustment to F# are simpler code in some scenarios
The disadvantages of making this adjustment to F# are added functions in FSharp.Core.
Open questions
- [ ] are the names what they need to be? Would
maxByIndexby better thanindexOfMax? Should it befindIndexOfMaxto fit withfindIndex. Other suggestions? - [ ] should there by any similar functions, e.g.
indexOfMaxByseems reasonable.
Extra information
Estimated cost (XS, S, M, L, XL, XXL): S
Affidavit (please submit!)
Please tick this by placing a cross in the box:
- [x] This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
- [x] I have searched both open and closed suggestions on this site and believe this is not a duplicate
- [x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.
Please tick all that apply:
- [x] This is not a breaking change to the F# language design
- [x] I or my company would be willing to help implement and/or test this
If I could give this a hundred upvotes I would. So many of the optimization papers I read have this as a fundamental concept.
I would love to have both indexOfMax/indexOfMin and indexOfMaxBy/indexOfMinBy. In Optimization and Data Analysis the need for these types of functions comes up frequently.
I would love to work with someone on implementing this. Since I have not contributed to the F# language in the past I would need some help and guidance.
@matthewcrews since @dsyme created the issue I believe he intends on implementing the functions, but I think it would be illustrative to give some examples of how you'd use these functions in the context of your work.
It would also be helpful if there are other things you find missing from F#, or at least feel like something that ought to be a part of FSharp.Core list/array/seq combinators when working in optimization and data analysis. Separate suggestions can be created for things that make sense. From there, RFCs and/or trial implementations could advance if approved.
@cartermp Will do. Here are some common use cases for what I have to do. This is focused on working with numbers so I'm not taking being able to operate on the generic case. I'll leave that to the professionals :)
These all come from me starting to work on a Linear Programming solver. There are actually four different variations on this concept which come up often. Below I show the different cases and what my initial thoughts are on what it would be like to work with them.
let vector = [|1..10|]
// Case 1: I need the index of the Max or Min value
// a should be 0
let a = Array.indexOfMin vector
// b should be 9
let b = Array.indexOfMax vector
// Case 2: I need to perform a function on the elements and get
// the index where the result is the Max or Min
let analysisFunction x =
-1.0 * x
// a should be 9
let a = Array.indexOfMinBy analysisFunction vector
// b should be 0
let b = Array.indexOfMaxBy analysisFunction vector
// Case 3: I need to only consider a subset of the elements
// and return the Min or Max index where the condition holds
let filterFunction x =
x >= 5.0 && x <= 6.0
// a should be 4
let a = Array.indexOfMinWhere filterFunction vector
// b should be 5
let b = Array.indexOfMaxWhere filterFunction vector
// Case 4: I need to consider only a subset of the elements
// and then compute a value for each element which passes
// the condition. Of those, I want the index of the element
// which gave the Min or Max value for the given function.
let analysisFunction x =
-1.0 * x
let filterFunction x =
x >= 5.0 && x <= 6.0
// a should be 5
let a = Array.indexOfMinByWhere filterFunction analysisFunction vector
// b should be 4
let b = Array.indexOfMaxByWhere filterFunction analysisFunction vector
I could also make the case that an i version of these functions would be helpful. These would be akin to the Array.mapi or Array.iteri functions. I completely understand if the sentiment is that it would crowd the module but I do have use cases where I need to know the index that I am currently iterating on.
When formulating problems or performing analysis over arrays I often have to index into another array. These are all things I'm going to have to write for myself if they are not in FSharp.Core. The examples here are silly but they show some of my common usage patterns.
// The i version of the Max and Min functions
let vector = [1..10]
let otherVector = [11..20]
let analysisFunction (index:int) x ->
otherVector.[index] + x
// Min and Max Byi functions
let a = indexOfMinByi analysisFunction vector
let b = indexOfMaxByi analysisFunction vector
// Min and Max By i Where functions
let a = indexOfMinByiWhere filterFunction analysisFunction vector
let b = indexOfMaxByiWhere filterFunction analysisFunction vector
I took a couple of minutes to write up what I think I would expect from the type signatures:
// Function signatures
// 'T [] -> int
Array.indexOfMin
// 'T [] -> int
Array.indexOfMax
// ('T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMinBy
// ('T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMaxBy
// ('T -> bool) -> 'T [] -> int
Array.indexOfMinWhere
// ('T -> bool) -> 'T [] -> int
Array.indexOfMaxWhere
// ('T1 -> 'T2) -> ('T1 -> bool) -> 'T1 [] -> int
Array.indexOfMinByWhere
// ('T1 -> 'T2) -> ('T1 -> bool) -> 'T1 [] -> int
Array.indexOfMaxByWhere
// The i version of the functions. Same as the ones above except the index
// of the current element is passed to the function as the first argument to the input functions
// (int -> 'T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMinByi
// (int -> 'T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMaxByi
// (int -> 'T -> bool) -> 'T [] -> int
Array.indexOfMinWherei
// (int -> 'T -> bool) -> 'T [] -> int
Array.indexOfMaxWherei
// (int -> 'T1 -> 'T2) -> (int -> 'T1 -> bool) -> 'T1 [] -> int
Array.indexOfMinByWherei
// (int -> 'T1 -> 'T2) -> (int -> 'T1 -> bool) -> 'T1 [] -> int
Array.indexOfMaxByWherei
This is what I ended up doing for working with Arrays: https://github.com/matthewcrews/Scratchpad/blob/master/ArrayIndexing.fsx
Matthew please feel free to make a trial implementation.
However I think we would not include the indexed or indexOf..Where. The former because there are other functions that would get an indexed version before these would, and the latter because it is a design principle of fsharp core not to have filtering variations, eg mapWhere.
Sounds good. The reason for the indexOf..Where functions was to cover cases that the R data.table library provides for that I have found useful in the past. They are advanced use cases but they have come up. I can just offer them in a separate library on Nuget.
Did we want this for anything besides Array and List? I could see having this for Map as well since it is a data structure with index lookup.
@matthewcrews Yes, that makes sense.
We should do Seq also for consistency.
Although longer, findIndexOfMax[By] seems a better fit given we already have other findIndex methods (findIndex and findIndexBack)
I don't know if that would be a good idea. A Seq could be potentially infinite in length and therefore these functions may never return. That was why I was not planning on implementing them.
There are many Seq functions like that, including Seq.exists for example. So we should include Seq
I'm happy to add it for Seq as well. Right now my problem is that I can't find any guidance on getting testing setup and running for solution. I added functions for List and Array and unit tests for them but it says it fails because they aren't registered anywhere. Is there any documentation about how to properly add and run tests for this solution? I'd love to help by adding these functions but there's not a lot of guidance that I can find.
@dsyme
it is a design principle of fsharp core not to have filtering variations, eg mapWhere.
Isn't that Choose?
I think choose is not exactly a filter or a Where. It's a kind of flatten for a Collection of options (or something that maps to option).
Having said that, I think the name should have been chooseBy for consistency.
Perhaps there are different interpretations, but the way I read it is that it performs a map on a sequence to some sequence of options, filtering out the Nones and lifting the Somes. It's a map and a filter at the same time.
It's incredibly useful and one of my favourite Seq / List / Array functions, but I did always wonder why it wasn't just called filterMap or something similar.
Per Don Syme, I will not be including the indexOf...Where functions. I will be adding them for myself and hope to publish it as a separate Nuget library.
Conceptually you can think of the indexOf...Where functions as a map then choose but that implementation would be terribly inefficient. My understanding was that we wanted a fast implementation of these functions which is why I use mutable variables to track the Min/Max as I iterate through the collections.
I need the indexOf...Where functions for implementing an LP Solver which does a lot of index searching over arrays.
I think choose is not exactly a filter or a Where. It's a kind of flatten for a Collection of options (or something that maps to option).
The specific design decision I was referring to was not to have extra "where" versions that take a boolean-valued predicate. It's true that choose sneaks past this by providing an option-valued projection. So you could imagine a
findIndexOfMaxBy : ('T -> 'U option) -> int
However that would still result in quite a proliferation of functions.
So choose and pick are basically the only places we do this.
You can also perform this operation with a single iteration of the source collection: Seq.indexed >> Seq.maxBy snd >> fst. However for Arrays and other eager collection types this would (I imagine) allocate an intermediate collection (plus the extra tuple etc.).
ArgMax and FindIndexOfMax are good names.
Definitely good to have equivalence to the code in the OP using FindIndex, in particular returning the first index giving the max and failing if the max is not found (empty list). This will need to be in the xml annotation.
@charlesroddie, like the IndexOf methods for strings, I think it's reasonable to return -1 for empty lists, arrays, sequences.
Btw, there was a suggestion here to implement this for Map, but I don't think it makes sense, as maps are not ordered, so the returned index doesn't make sense.
That's OK. Whichever decision is taken, the naming should correspond to the corresponding existing find/index method so that the behavior is expected. If we go with your suggestion that would be IndexOfMax.
@charlesroddie, like the
IndexOfmethods for strings, I think it's reasonable to return-1for empty lists, arrays, sequences.Btw, there was a suggestion here to implement this for
Map, but I don't think it makes sense, as maps are not ordered, so the returned index doesn't make sense.
Why not return the key in that case?