fslang-suggestions icon indicating copy to clipboard operation
fslang-suggestions copied to clipboard

Add "partitionMap" for List/Array/Seq/Set

Open SchlenkR opened this issue 3 years ago • 19 comments

Add "partitionMap" for List/Array/Seq/Set

I propose we add functions to the List/Array/Seq/Set modules that partition a collection of values into many collections of different types by a given mapper:

val partitionMap: mapper: ('a -> Choice<'b,'c>) -> source: 'a list -> 'b list * 'c list

Example (usage)

let evens,odds =
    [ 0..10 ] |> List.partitionMap (fun x ->
        let isEven = x % 2 = 0
        match isEven with
        | true -> Choice1Of2 x
        | false -> Choice2Of2 x)

// val odds: int list = [1; 3; 5; 7; 9]
// val evens: int list = [0; 2; 4; 6; 8; 10]

n partitions

There could exist partitionMap(n)s for more than 2 partitions (e.g. Choice is modeled for up to 7 choices):

val partitionMap3: mapper: ('a -> Choice<'b,'c,'d>) -> source: 'a list -> 'b list * 'c list * 'd list

Non-disjoint sets

Instead of mapping an input collection of n values to 2 collections of i, j values, where i + j = n, a more general approach could be mapping n values to collections where each collection has a maximun of n values:

// partitionMap for 3 non-disjoint partitions
val partitionMap3: mapper: ('a -> Option<'b> * Option<'c> * Option<'d>) -> source: 'a list -> 'b list * 'c list * 'd list

Example

let evens,odds,specials =
    [ 0..10 ] |> List.partitionMap3 (fun x ->
        let isEven = x % 2 = 0
        let isSpecial = x = 7
        
        let evenResult = if isEven then Some x else None
        let oddResult = if not isEven then Some x else None
        let specialResult = if isSpecial then Some (float x) else None
        
        evenResult, oddResult, specialResult
    )

// val evens: int list = [0; 2; 4; 6; 8; 10]
// val odds: int list = [1; 3; 5; 7; 9]
// val specials: float list = [7.0]

Alternatives

The existing way of approaching this problem in F# is composing existing List/Array/Seq functions, or by using FSharpPlus.

Links

  • The FSharpPlus implementation can be found here (in 2 versions: 1 using an optimized ListCollector for dotnet and another using recursion - thanks @gusty for the update).
  • A tweet adressing the proposal is here.

Pros and Cons

The advantages of making this adjustment to F# is having a useful function directly accessible in an idiomatic way without the need to resort to a package (FSharpPlus) or using hand-written code.

The disadvantages of making this adjustment to F# is: Having more and more functions in the core library could make it harder for beginners to see the essentials (which could be adressed by improved editor support).

Extra information

Estimated cost: XS - S

Affidavit

  • [x] This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • [x] I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • [x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • [x] This is not a breaking change to the F# language design
  • [x] I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

SchlenkR avatar Feb 20 '22 06:02 SchlenkR

I usually wind up adding some variation of this suggestion to every project on which I work. So, yeah, having a very efficient version of this "in the box" would be awesome. 👍

One nit-pick, though: I wouldn't expect to see this defined for 'T seq -- only for the non-lazy collections. At least, that would track with the .partition function.

pblasucci avatar Feb 20 '22 08:02 pblasucci

One nit-pick, though: I wouldn't expect to see this defined for 'T seq -- only for the non-lazy collections

Seq.partition does indeed not exist, but there are aggregation functions like Seq.min that require enumerating the sequence, so why not a partition / partitionMap for Seq, too? If not, it would mean that applying partitionMap over a sequence requires materialization to List/Array before. For Set, that sounds right, because of distinct, but for List/Array, it means more allocation overhead that is not really necessary. Maybe for Seq.partitionMap, the resulting collection type should not be a tuple of seq, but a non-lazy collection type (list or array).

SchlenkR avatar Feb 20 '22 08:02 SchlenkR

As @dsyme noted, partitionBy or partitionWith might be better names.

In List module, suffix usage is:

  • by is always used for functions taking a 'T -> 'Key projection (used in aggregation functions, like sumBy, or maxBy, or also distinctBy).
  • byXXX: There is a chunkBySize function that takes a float value
  • with is used for functions taking a 'T -> 'T -> int comparer (used in compareWith and sortWith).

The characteristics of this partitionMap function of this proposal might be closer to an aggregation function with projection than to a comparision function.

SchlenkR avatar Feb 21 '22 08:02 SchlenkR

I suggest giving an example where this is useful. The current example simplifies to partition.

charlesroddie avatar Feb 27 '22 14:02 charlesroddie

Would the name partition(n) instead of partitionMap(n) be more understandable? The proposed partitionMap (n = 2) can either just not exist or be named partition2.

Happypig375 avatar Feb 27 '22 15:02 Happypig375

From my understanding, the initial example doesn't simplify directly to partition, because there's a mapping involved from int to float for the last part.

Another example is this:

module List =
    let partitionBy3 (mapper: 'a -> Choice<'b,'c,'d>) (source: 'a list) : 'b list * 'c list * 'd list =
        failwith "TODO"

type Typ = unit
type Constraint = unit
type Err = unit
type Node =
    | Typed of Typ
    | Error of Err
    | Open of Constraint

let (|AnyError|AnyOpen|AllTyped|) (nodes: Node list) =
    // instead of multiple choose ...
    let es,os,ts =
        (nodes |> List.choose (function | Error x -> Some x | _ -> None)),
        (nodes |> List.choose (function | Open x -> Some x | _ -> None)),
        (nodes |> List.choose (function | Typed x -> Some x | _ -> None))

    // List.partitionBy(n) can be used
    let es,os,ts = nodes |> List.partitionBy3 (function
        | Error x -> Choice1Of3 x
        | Open x -> Choice2Of3 x
        | Typed x -> Choice3Of3 x)

    match es,os,ts with
    | [], [], ts -> AllTyped ts
    | [], os, _ -> AnyOpen os
    | es, _, _ -> AnyError es

SchlenkR avatar Feb 28 '22 08:02 SchlenkR

As @dsyme noted, partitionBy or partitionWith might be better names.

Most of the By and the With family named functions return the original type (sumBy, averageBy and countBy being exceptions that project to number types), where as the map family named functions (of which there are very few) return a new type like this proposed function would. By and With seem to be breaking the mold a bit.

michaeloyer avatar Mar 11 '22 16:03 michaeloyer

What about expanding the naming to the choose family of functions? Choose currently uses the option union type, and maps the new type from that union type, and that's kind of what's happening here with the Choice union types.

Function Chooser Function Returns
List.choose 'T -> Option<'U> 'U list
List.choose2 'T -> Choice<'U, 'V> 'U list * 'V list
List.choose3 'T -> Choice<'U,'V,'W> 'U list * 'V list * 'W list

At the same time I kind of like keeping the naming with partition too.

michaeloyer avatar Mar 11 '22 16:03 michaeloyer

I don't like choose fot this, because choose is about filtering out elements. Partitioning is separating elements from each other while retaining all original elements. This is what happens in this proposal.

SchlenkR avatar Mar 12 '22 00:03 SchlenkR

Again, what about partition2, partition3, partition4 etc?

Happypig375 avatar Mar 12 '22 05:03 Happypig375

choose is about filtering out elements. Partitioning is separating elements from each other while retaining all original elements.

True with partition you could take the two lists from the tuple and put them back together again to get roughly the same list, but I'd still think of partition as being about filtering.

choose is about filtering out elements.

I've actually always seen choose as a combination of filtering and mapping in a single function which I still see as being very similar to the pattern in this proposal.

All that said, I'm happy to agree to disagree and wouldn't lose sleep if we called it partitionMap, partitionBy, or partition2 over choose2. 😄

michaeloyer avatar Mar 12 '22 17:03 michaeloyer

Again, what about partition2, partition3, partition4 etc?

In this case, partition would be a special case of partition2, and partition(n) are all on that level of generalization. IMHO, just adding n to partition does not indicate the Map character of those functions.

After all, I prefer partitionMap for 2 reasons:

  1. It is already known and used from FSharpPlus.
  2. Having a look at the mapFold function, it's documentation says: "Combines map and fold" (although I don't know the reason why it wasn't called foldMap, as for me, fold is the dominant part here). partitionMap "Combines partition and map" - then let's call is so.

SchlenkR avatar Mar 14 '22 11:03 SchlenkR

@dsyme Is that something that's likely to happen ("approved-in-principle"), and - if so - a "good first issue"? Or should I close it?

SchlenkR avatar Jun 13 '22 13:06 SchlenkR

I think it's good, I've needed this and it's not possible to write easily with comprehensions. I do think I prefer partitionBy. I don't think the core library should have the 3, 4... versions

dsyme avatar Jun 13 '22 20:06 dsyme

I don't see any real objections so will mark as approved.

dsyme avatar Jun 13 '22 20:06 dsyme

The name split should also be considered, despite the other uses of splitInto, splitAt etc. Compare with this for Event.split for example.

https://github.com/dotnet/fsharp/blob/main/src/FSharp.Core/eventmodule.fsi#L215

dsyme avatar Jun 13 '22 20:06 dsyme

The problem with split alone is that it will shadow (or will be shadowed by, depending on the open sequence) FSharpPlus's List.split which is a very useful function and probably there are many projects out these using it.

It's unfortunate that they used that name for Event.split, I really think partitionMap is a consitent name as the partition operation already exists in the List module and it returns a tuple of lists, at the same time as @dsyme mentioned splitX names are already common in the List module.

I think it's important to have an API with consistent names, that's one of the main goals of FSharpPlus lib, but without shadowing/changing behavior of FSharp.Core , but if it wasn't I had renamed many functions in FSharp.Core to make it more consistent, like prefixing with operation and using consistent suffixes, eg for choose I had renamed it to chooseBy and had a version of it called choose which is equivalent to current choice id which btw I think it's currently being used more.

What I want to say is that it's not too late to start aligning names at least for new functions and for old ones we just assume the naming was not the best choice but unfortunate it will have to stick with it.

An alternate name could have been partitionWith as the With suffix is already in use for a mapping parameter used also in the result (as opposed to By).

Note that split alone is not consistent with the operation nor with the suffix, as it should have one since it accepts a mapper/spiltter/partitioner argument.

Finally, note the relationship between this function and choose, you can say its the same as choose but for Choices instead of Options.

gusty avatar Sep 15 '22 07:09 gusty

For the record, a similar discussion in the context of Scala's standard library: https://github.com/scala/collection-strawman/issues/235#issuecomment-397565701

partitionMap was eventually added to Scala.

LPTK avatar Jan 03 '23 16:01 LPTK

Is partition well composing contract? Business tends to change so that "we need yet another partition". Data-flows tend to always get more complex over time.

//This code:
let myFirstPartition = myItems |> Array.filter(fun x -> ...)
let mySecondPartition = myItems |> Array.filter(fun x -> not ...)

//will eventually come to this:
let myFirstPartition = myItems |> Array.filter(fun x -> ...)
let mySecondPartitionSplit1 = myItems |> Array.filter(fun x -> not ... && ...)
let mySecondPartitionSplit2 = myItems |> Array.filter(fun x -> not ... && not ...)

And then fourth partition, and fifth... Would it be more simpler to accept the hit and write it twice like in the first code?

If you think it's ok to write complex code, you could just use .reduce and yield list of partitions (e.g. your own return-type) instead of fixed tuple of 2 (or 3).

Thorium avatar Sep 26 '23 14:09 Thorium