fslang-suggestions icon indicating copy to clipboard operation
fslang-suggestions copied to clipboard

"for-with" syntactic sugar for folds

Open reinux opened this issue 1 year ago • 2 comments

I propose we add syntax sugar to turn

let result =
  for s with t in ts do
    s + t

into

let result =
  ts
  |> #Collection.fold (fun s t -> s + t) s

The with keyword introduces the initial state, and the value inside the for expression is expected to be the same as the initial state.

The existing ways of approaching this problem in F# are

fold

Folds are widely used yet slightly unwieldy, especially for beginners but even for experienced users. Nested folds tend to get really messy if we aren't extra careful with formatting. There are also several ways to write them: (state, list) ||> fold f, list |> fold f state, fold f state list. Each style has its detractors, often not without good reason. Only the first offers type inference for both state and `list within the function, but is a somewhat obscure approach.

let rec f s ts = ...

Ad-hoc recursive functions are a bit verbose, so we don't use them unless we really need to.

let mutable s in for t in ts do s <- ...

Finally, there probably isn't anything wrong with using a mutable accumulator, but it's just not something people like to reach for because it feels icky.

Pros and Cons

The advantages of making this adjustment to F# are

  • Conducive to useful type inference (because s is specified prior to the function)
  • Much easier for new FP users to understand without sacrificing immutability
  • Better legibility for experienced users in many situations, e.g. when nested
  • Alleviate the need for closing brackets at the end of the fold function or for trailing arguments, which tend to be a bit unsightly following long funs
  • Natural addition to the for loop

The disadvantages of making this adjustment to F# are

Yet more syntax to document and learn. On the other hand, it seems quite intuitive and hence easy to remember.

I don't think it should require any changes to the API for CEs, but I could be wrong.

Extra information

Estimated cost (XS, S, M, L, XL, XXL):

M (please correct me if I am wrong)

Related suggestions: (put links to related suggestions here)

Perhaps we can have a yield version that translates into a scan:

let results =
  [ for s with t in ts do
      yield s + t
  ]

There is some slight ambiguity here, as yield is no longer a required keyword when the for is used in a sequence/list/array. Some users may expect for-with to behave as a scan even without the yield keyword if it is within a list.

Please let me know if I should open another issue for this suggestion, and I will update with the link here.

Affidavit (please submit!)

Please tick these items by placing a cross in the box:

  • [x] This is not a question (e.g. like one you might ask on StackOverflow) and I have searched StackOverflow for discussions of this issue
  • [x] This is a language change and not purely a tooling change (e.g. compiler bug, editor support, warning/error messages, new warning, non-breaking optimisation) belonging to the compiler and tooling repository
  • [x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it
  • [x] I have searched both open and closed suggestions on this site and believe this is not a duplicate

Please tick all that apply:

  • [x] This is not a breaking change to the F# language design
  • [x] I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

reinux avatar May 21 '24 12:05 reinux

This is interesting. I have wondered about something like this before...

Re: adding something like this to the language

For something like this to work in F#, there would need to be some type-based and/or structural definition of "foldable" as there currently is for enumerables.

Even then, types would actually need to conform in their definitions to whatever convention was decided upon — unless this RFC were implemented: https://github.com/fsharp/fslang-design/blob/cd6085fb9f3a50093938d616fde8776d3be2cdad/RFCs/FS-1043-extension-members-for-operators-and-srtp-constraints.md

An alternative approach could be to borrow the idea from C# of using some kind of attribute to tie modules and their fold functions to their corresponding types. C# does this to find the appropriate Create method to use for collection expressions. Again, though, any existing modules would need to be annotated.

Maybe you could use a heuristic like "look for a fold function operating on type T on a module in scope called T or TModule," although that seems like a pretty messy way to go about it.

Computation expressions

It's already possible to implement something like this yourself using computation expressions. All of the problems above still prevent you from writing a more general computation expression that handles any "foldable" type, though (see, e.g., https://github.com/Savelenko/FSharp.Control.Fold?tab=readme-ov-file#making-your-data-types-compatible-with-the-library).

I can think of a few other syntaxes off the top of my head.

Here are some very rough proofs of concept: https://gist.github.com/brianrourkeboll/830408adf29fa35c2d027178b9f08e3c

  1. This might be the most consistent:

    fold state { for x in xs -> fun acc -> f acc x }
    
    let sum xs = fold 0 { for x in xs -> (+) x }
    
    let rev xs = fold [] { for x in xs -> fun acc -> x :: acc }
    
    let rebuild m = fold Map.empty { for KeyValue (k, v) in m -> Map.add k v }
    

    I.e., you provide the initial state as an argument to the builder, and you then simply yield functions that take the state and return a new value.

  2. This is kind of nice, too, though?

    fold state { for acc, x in xs -> f acc x }
    
    let sum xs = fold 0 { for acc, x in xs -> acc + x }
    
    let rev xs = fold [] { for acc, x in xs -> x :: acc }
    
    let rebuild m = fold Map.empty { for acc, KeyValue (k, v) in m -> acc.Add (k, v) }
    

    I.e., you provide the initial state as an argument to the builder, and then the intermediate state is exposed to you as the first element of a tuple produced via for.

  3. I also kind of like this, although it starts to make less sense if you allow multiple loops, standalone yields, etc.:

    fold { for acc, x in state, xs -> f acc x }
    
    let sum xs = fold { for acc, x in 0, xs -> acc + x }
    
    let rev xs = fold { for acc, x in [], xs -> x :: acc }
    
    let rebuild m = fold { for acc, KeyValue (k, v) in Map.empty, m -> acc.Add (k, v) }
    

    I.e., you provide the initial state as an argument to in, and then the intermediate state is exposed to you as the first element of a tuple produced via for.

You can extend any of these to handle additional foldable types, including non-enumerable ones, although you need to do it one by one:

type FoldBuilder<'T> with
    member inline builder.For (option : _ option, [<InlineIfLambda>] body : _ -> FoldBuilderCode<_>) =
        FoldBuilderCode<'T> (fun sm ->
            match option with
            | Some x -> (body x).Invoke &sm
            | None -> ())

let two = fold 1 { for x in Some 1 -> (+) x }

type FoldBuilder<'T> with
    member inline builder.For (result : Result<_, _>, [<InlineIfLambda>] body : _ -> FoldBuilderCode<_>) =
        FoldBuilderCode<'T> (fun sm ->
            match result with
            | Ok x -> (body x).Invoke &sm
            | Error _ -> ())

let three = fold 1 { for x in Ok 2 -> (+) x }

I guess you could also do something like this, although again you'd still need to explicitly define extensions:

[<Extension>]
type FoldBuilderExtensions =
    [<Extension>]
    static member inline For<
        'Input,
        'Extensions,
        'Intermediate,
        'State when ('Input or 'Extensions) : (static member Fold : ('State -> 'Intermediate -> 'State) * 'State * 'Input -> 'State)
        > (
            _builder : FoldBuilder<'State, 'Extensions>,
            foldable : 'Input,
            [<InlineIfLambda>] body : 'Intermediate -> FoldBuilderCode<'State>
        ) =
        let folder sm x =
            let mutable sm = sm
            (body x).Invoke &sm
            sm

        let inline call folder state input = ((^Input or ^Extensions) : (static member Fold : ('State -> 'Intermediate -> 'State) * 'State * 'Input -> 'State) (folder, state, input))

        FoldBuilderCode<'State> (fun sm -> sm <- call folder sm foldable)

[<Extension>]
type FoldExtensions =
    [<Extension>]
    static member Fold (folder, state, input) = List.fold folder state input

    [<Extension>]
    static member Fold (folder, state, input) = Array.fold folder state input

    [<Extension>]
    static member Fold (folder, state, input) = Set.fold folder state input

    [<Extension>]
    static member Fold (folder, state, input) = Option.fold folder state input

    [<Extension>]
    static member Fold (folder, state, input) = ValueOption.fold folder state input

    [<Extension>]
    static member Fold (folder, state, input) = Result.fold folder state input

    [<Extension>]
    static member Fold (folder, state, input) = Seq.fold folder state input

let fold<'T> state = FoldBuilder<'T, FoldExtensions> state

let sum = fold 0 { for x in [1..100] -> (+) x }

let sum' = fold 0 { for x in Some 1 -> (+) x }

let sum'' = fold 0 { for x in set [1..100] -> (+) x }

let sum''' = fold 0 { for x in [|1..100|] -> (+) x }

brianrourkeboll avatar May 21 '24 17:05 brianrourkeboll

Thanks for the detailed reply!

The CEs do indeed look nice. I probably wouldn't hesitate to use them myself if they were built-in, but I do have to wonder if it might be a bit magical-looking for beginners?

Maybe instead of unrolling to fold, we can translate it all the way to the imperative version, like this (using IEnumerable as opposed to ICollection too, btw):

let result =
  let mutable s = s
  let enr = ts.GetEnumerator()
  while enr.MoveNext() do
    s <- f s enr.Current
  s

We do lose some rigor, but my gut feeling is that that might not be so bad for a semi-imperative language like F#, as long as it's documented that that's what it's doing under the hood. Unless I'm missing something, it should be fairly rare that fold implementations do anything beyond that, so I think it shouldn't cause too many surprises.

reinux avatar May 21 '24 21:05 reinux

isn't the for loop an imperative construct with the iteration variable mutable without the use of the mutable keyword anyway? the benefit of hiding mutability just feels like a lie.

Happypig375 avatar Jul 22 '25 10:07 Happypig375

I don't really like it from first glance. It seems to introduce a huge change in F# making loops actually return something other than "unit" but it seems to support only "fold" operation which can be executed as a function or a CE like described above. And it's confusing.

let s = 0
let q = for i in 0..10 do s + i // unit
let w = for s with i in 0..10 do s + i // type is int even though loop body is very similar, the only change is at loop declaration

Also, I think it might be quite confusing because today F# is "yielding" expressions in loops E.g.

let s = 0
let q = seq { for i in 1..10 do s + i } // int seq, 10 values
let w = seq { for s with i in 1..10 do s + i } // int seq, but now only 1 value is yielded

let q1 = seq { for i in 1..10 do s + i; s + i } // int seq, 20 values
let w2 = seq { for s with i in 1..10 do s + i; s + i } // int seq, 11 values now?

En3Tho avatar Jul 24 '25 14:07 En3Tho

@En3Tho

// Top code
let s = 0
let q = for i in 0..10 do s + i // <- warning (discard), so it already reminds you what it does
let w = for s with i in 0..10 do s + i // value is indeed used

For the bottom code there are a few choices that can be made -

  • Allow implicit yields before the final value
  • Warn with implicit yield behaviour
  • Warn with discard behaviour
  • Error because it's ambiguous

Warning with discard is also a valid choice when inside this kind of for loop. Explicit yields are always available if needed.

Happypig375 avatar Jul 24 '25 16:07 Happypig375

I agree with @En3Tho that suggested syntax is a bit confusing, however I do agree that variable leak is kind of a problem. But I feel it's more a block scope that is missing rather than accumulator, so this syntax reads more natural to me:

for i in 1..10 with mutable s = 0 do
    s <- s + i

I do expect iterator variable to go after for rather than accumulator.

Lanayx avatar Jul 28 '25 00:07 Lanayx

@En3Tho @Lanayx Please review the revised version of this suggestion with a full critique on existing approaches of mutable variables or folds with lambdas: https://github.com/fsharp/fslang-design/pull/807

Happypig375 avatar Jul 28 '25 10:07 Happypig375

Is there a language where programmers can do folds over collections into a single value, without using a fold-like function call? The closest similarity I could find is using .. = ref initialState in for ... in ocaml:

let acc = ref 0 in for i = 0 to 2 do acc := !acc +i done; !acc

Which in F# one could also rework into mutable instead of ref:

let mutable sum = 0 in 
    for x in [|1;2;3|] do 
        sum <- sum + x

(which is not a style typically used in F# programs)

T-Gro avatar Jul 29 '25 09:07 T-Gro

@T-Gro

// Current - fold
let effects, model =
    Seq.fold (fun (effects, model) item ->
        let effect, model = Model.action item model // Pure function
        let model = Model.action2 model // Pure function
        effect :: effects, model)
        ([], model) // ugh - hard to get order and parentheses right
        items
// Current - ||> fold
let effects, model =
    (([], model), items) // ugh - nested tuples and hard to get order and parentheses right
    ||> Seq.fold (fun (effects, model) item ->
        let effect, model = Model.action item model // Pure function
        let model = Model.action2 model // Pure function
        effect :: effects, model)
// Current - recursive function
let effects, model =
    let rec [<TailCall>] processItems (effects, model) items = // boilerplate, hard to write correctly
        match items with // boilerplate and specific to lists - can't do this for other collections
        | [] -> effects, model // boilerplate
        | item::items -> // boilerplate
            let effect, model = Model.action item model // Pure function
            let model = Model.action2 model // Pure function
            processItems (effect :: effects, model) items
    processItems ([], model) items
// Current - mutable accumulator
let effects, model =
    let mutable accum = [], model // hmm, a mutable variable is required even in an architecture of pure functions and it needs the most boilerplate
    for item in items do
        let effects, model = accum // boilerplate
        let effect, model = Model.action item model // Pure function
        let model = Model.action2 model // Pure function
        accum <- effect :: effects, model
    accum // boilerplate
// Proposed - fold loop (simplest this can be)
let effects, model =
    for effects, model = [], model with item in items do
        let effect, model = Model.action item model // Pure function
        let model = Model.action2 model // Pure function
        effect :: effects, model

Happypig375 avatar Jul 29 '25 13:07 Happypig375

I get the benefits and how it fits pragmatic programming, my question was really after if this was being solved in any other programming language or its library ecosytem.

Also not really sold on the ordering of the syntax - I try to think of it as evolution of for item in items, and have question to the choices behind the proposed syntax:

Why does the accumulated state syntactically come before the enumeration? Why does the binding come before the with keyword, and not the other way around?

Is the natural idea of a fold primarily about the state, "with" the enumeration being the secondary? I tend to think about it being iteration first, "with" some additional state to be kept around.

(in natural language, I would typically say "The code is folding over ...collection... while using ...state accumulation...")

T-Gro avatar Jul 29 '25 13:07 T-Gro

any other programming language or its library ecosytem.

It seems that other programming languages and their library ecosystems maintain the bifurcation of either loops with mutable accumulators or folds with lambda body.

order of accumulator before enumeration

~~The accumulator is placed before the enumeration item because enumeration state must exist before it is used to get an item from the sequence. It is also why the fold lambda body takes the accumulator before the sequence item, making refactors from fold easy.~~

let sequence = ["Hello"; " "; "World"; "!"] // sequence
let mutable state_accum = "" // accumulator state
let state_iterator = sequence.GetEnumerator() // enumerator state
while state_iterator.MoveNext() do
    let word = state_iterator.Current // state must be defined before the iteration item is retrieved
    state_accum <- state_accum + word
printfn "%s" state_accum

~~This also highlights that while for i = 1 to 10 do and for i in 1 .. 10 do enumerate on the same items, they are semantically very different: for i = 1 to 10 do has i as the enumeration state, while for i in 1 .. 10 do hides the enumeration state and i is the enumeration item. Meanwhile, for i = 1 with j in 1 .. 10 do has two enumeration states: i and the enumerator of 1 .. 10 which must exist before j is retrieved. Therefore, it makes less sense to put the accumulator after the enumeration item.~~

I accepted placing the accumulator after the enumerator in the RFC: https://github.com/fsharp/fslang-design/pull/807

Happypig375 avatar Jul 29 '25 13:07 Happypig375

I am personally still not sold on this feature given how much overlap it has with what you can already do with computation expressions.

The arguments in the RFC against the suitability of computation expressions for this hold equally well for literally any use of computation expressions for any purpose.

To me, that means that we might be better served by investing time in addressing those pain points of computation expressions generally—e.g., an inline bind operator (https://github.com/fsharp/fslang-suggestions/issues/1070), improvements to the compiler that take advantage of knowledge of how computation expression builder methods are used to give better error messages or to resolve overload resolution ambiguities, etc.—rather than adding new special syntax for specific use-cases like this.

The RFC also currently seems to focus on an elaboration of the new syntax to something involving for-loops. But as noted above in https://github.com/fsharp/fslang-suggestions/issues/1362#issuecomment-2123061400, there are many foldable things that are not directly enumerable with a regular for-loop in F# today—like 'T option, Result<'Ok, 'Error>, etc.—and F# of course currently lacks a way to define a generic "foldable" construct at all, whether for consumption by computation expressions or by a dedicated syntax.

brianrourkeboll avatar Jul 29 '25 16:07 brianrourkeboll

@brianrourkeboll

The arguments in the RFC against the suitability of computation expressions for this hold equally well for literally any use of computation expressions for any purpose.

  • "an inline bind operator (https://github.com/fsharp/fslang-suggestions/issues/1070)" is unrelated to the point about "But this is not orthogonal to an existing computation expression context unlike the for loop which allows let! inside that refer to the outer context." Consider this:
    // CE (variation 1)
    let xs = [ fold 0 { for x in xs do yield 1 (*You cannot yield to the outer list here!*); yield (+) x } ]
    // Proposed fold loop (Alternative 2)
    let xs = [ for s = 0 with x in xs do yield 1 (* This yields to outer list :D *); s + x; done; s ]
    
  • The point about "Computation expressions also show a heavily different syntax compared to for loops and folds which add hinderance to understanding" is also not applicable to "any use of computation expressions for any purpose" - it is fine to use the async or task CEs for example to hide large chunks of logic. But for something as simple as an accumulator, using a CE is overkill with the heavy semantics brought in with braces and indirection to underlying code (one must follow each Bind call to understand what a let! really does, combine with all the other methods in a CE) compared to just writing the underlying code (also applicable to option and result CEs for example).

there are many foldable things that are not directly enumerable with a regular for

Indeed. The standard method to represent a foldable type is to implement seq. Making options implement seq was declined because of the use of null to represent None, so Option.toSeq must be called. This shouldn't be a problem for ValueOption or Result though.

Happypig375 avatar Jul 29 '25 19:07 Happypig375

@brianrourkeboll See if the updated explanation here https://github.com/fsharp/fslang-design/pull/807/commits/979c708fa3ac4d09c97fe21f75f089b24aeb0523 is better.

Happypig375 avatar Jul 29 '25 22:07 Happypig375

@T-Gro

I get the benefits and how it fits pragmatic programming, my question was really after if this was being solved in any other programming language or its library ecosytem.

Racket's for/fold: https://docs.racket-lang.org/reference/for.html#%28form._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._for%2Ffold%29%29.

  • multiple named accumulators in a declerative format
  • multiple sequences (zip)
  • filters with #:when
  • early exit with #:break

aaronmu avatar Aug 06 '25 07:08 aaronmu

Racket's for/fold: https://docs.racket-lang.org/reference/for.html#%28form._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._for%2Ffold%29%29.

This is a nice reference, thanks. I like how Racket's syntactic forms allow them to build upon a basic construct with a consistent naming scheme. It feels natural and clear. (for/fold, for/last, for/foldr etc.- all in category of for loops).

The biggest drawback I see with the current proposal for F# is that I haven't seen a syntactical proposal that would nicely fit in with existing idioms - or at least I am not able to familiarize myself with any of them.

If people have to look up the syntax often ( What is the order of keywords? Do I have to let or let mutable the accumulator bindings or not? Is it a statement or an expression?), they might as well look up how to use the List.fold built-in function or one of the recommended CE alternatives. I am not saying that the existing usage of fold fits perfectly everywhere, I think this is quite clear as one gets into attempting to pipe it. But at least ||> did not need brand new syntax added to language - the justification bar for adding new unfamiliar syntax for a singular use case is fairly high.

T-Gro avatar Aug 06 '25 07:08 T-Gro

Another example from the Lisp family, in Common Lisp there's the iterate macro where this

(iter (for x in list)
      (for sum initially 0 then (+ sum x))
      (finally sum))

would be the equivalent of the proposed

for x in list with sum = 0 do
    sum + x

Tarmil avatar Aug 06 '25 21:08 Tarmil

@T-Gro I suppose the existing for loop, outside a computation expression, is a special case of the fold loop just like iter can be defined in terms of fold.

// existing for
for x in xs do printfn $"{x}"
// is equivalent to
for x in xs with state = () do printfn $"{x}" // accumulates unit, returns unit

with and an accumulator initializer is easy to remember, right?

Happypig375 avatar Aug 07 '25 10:08 Happypig375

I keep thinking about a syntax that would not feel enforced for a singular use case. For limited scenarios where one can pass built-in functions/operators, the sample provided by @brianrourkeboll IMO fits very well into existing F# already and makes use of existing features:

let sum xs = fold 0 { for x in xs -> (+) x }

For use cases that need writing an inner lambda, this becomes less elegant. Is there perharps a more general improvement we can make to the mechanics CEs that would work here?

Is there perharps an improvement alongside the CustomOperation that would make access to the inner state value elegant?

let sum xs = fold 0 { for x in xs -> x + accumulator } // accumulator being a custom keyword defined in the FoldBuilder

I would feel more confident about a general improvement that can leave space for variations/customizations to library authors, without baking it a singular-behaviour into the language (inamagine extensions like fold, foldright, reduce, foldwhile ,.. )

T-Gro avatar Aug 11 '25 11:08 T-Gro

@T-Gro CustomOperation has quite a lot of quirks, like MaintainsVariableSpaceUsingBind that is required to have later code be able to refer to bindings above - itself without enough documentation. The accumulator keyword, if allowed, also doesn't allow natural pattern matching on tuple accumulators without boilerplate - unless we start allowing dot access to tuple members? While I agree that the CE variant with inner lambdas gets clunky to use, other variations that abuse tuples fold state { for acc, x in xs -> f acc x } and fold { for acc, x in state, xs -> f acc x } certainly don't help diagnose tuple ordering or placement.

Fold is fundamental enough that many other functional operations can be defined in terms of it (parallelism aside - usually parallel functions are defined in separate dedicated modules anyway). A general improvement for higher order abstractions like reduce, sum etc. can be done, but for the fundamentals don't we have for syntax that abstracts iteration away from the need to use while loops and explicit GetEnumerator calls? The fold loop is a similar idea - folding as part of the programming language, while higher order abstractions like sum, reduce etc. can be improved generally with computation expressions. It's just a with keyword and an accumulator initializer.

The downside of trying to extend upon computation expressions is that they need adequate tooling to have as neat an experience as basic for loops provide - an inspection tool following CE builder method calls, proper debug points, overload error messages... are all huge amounts of work, compared to just adding a with clause to for loops. Given that the current F# team size seems to be 2 + Copilot, this work isn't going to be finished any time soon.

Happypig375 avatar Aug 11 '25 15:08 Happypig375

@T-Gro I added https://github.com/fsharp/fslang-design/pull/807/commits/0135550679daf3fca213eed877338ead2a95c6c3 to argue for improving folds specifically

Happypig375 avatar Aug 15 '25 20:08 Happypig375

@T-Gro I added fsharp/fslang-design@0135550 to argue for improving folds specifically

Apologies for not contributing too meaningfully to this discussion, but I just wanted to comment on how fascinating it is that those numbers fit Zipf's Law pretty tightly, especially if you omit iter, which, incidentally (or not?), is the only impure HOF in the bunch.

Image

reinux avatar Aug 16 '25 08:08 reinux