fslang-suggestions icon indicating copy to clipboard operation
fslang-suggestions copied to clipboard

Break loop and else branch

Open jl0pd opened this issue 3 years ago • 6 comments

Right now all loops (for i=0 to 10 do, for el in seq do, while smth do) are valid expressions that always return unit.

I propose we allow break keyword to be used to stop loop and return value:

let indexOf value (ar: 'a[]) =
    for i = 0 to ar.Length - 1 do
        if ar[i] = value then
            break i
    else
        printfn "array is empty or value wasn't found"
        -1

Unlike many other languages break is not simple statement that just stops loop, instead it return value from loop. Since expression always should have value we need to also add else-branch to loop that will be executed if loop wasn't stopped by break, like in python.

Many handwritten loops that require early return will be much cleaner, e.g. Array.contains from FSharp.Core may be written as:

let inline contains value (array: 'T[]) = // 'a -> 'a[] -> bool
    checkNonNull "array" array
    for el in array do
        if el = value then
            break true
    else
        false

This construct should be allowed in all variations of loops.

The existing way of approaching this problem in F# is bunch of workarounds.

Pros and Cons

The advantages of making this adjustment to F# is making it easier and cleaner to express original intention.

The disadvantages of making this adjustment to F# are more complexity in language.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): L

Related suggestions:

  • https://github.com/fsharp/fslang-suggestions/issues/381

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • [x] This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • [x] I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • [x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • [x] This is not a breaking change to the F# language design
  • [x] I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

jl0pd avatar Nov 28 '21 10:11 jl0pd

I like this proposal in general.

Yielding values directly in break seems like unnecessary or even too restrictive for some corner cases. This is independent of using for inside a computation (like list or seq), or as a statement-like for-loop, as long as the computation builder has a Zero element (e.g. an empty list).

Here's an example using lists:

let evenValuesKnockoff exclThreshold elements =
    [
        for x in values do
            if x < exclThreshold then
                yield x
            else
                break // there is nothing to yield here, except for List.Zero (which is: [])
    ]

Even simpler is this, without yielding anything, and resulting in the Zero element:

let empty (values: 'a list) =
    [
        for x in values do
            break
    ]
// result: [] : 'a list

It seems that "break value" is just "yield value (combined with) break".

Here's the contains example from the suggestion above:

let inline contains value (array: 'T[]) =
    checkNonNull "array" array
    for el in array do
        if el = value then
            yield true
            break
    else
        yield false

SchlenkR avatar Dec 06 '21 09:12 SchlenkR

Links the archive issue for this for posterity.

cartermp avatar Dec 06 '21 23:12 cartermp

@ronaldschlenker yield have different meaning, it's allowed multiple times, e.g. yield 1; yield 2 while no code should be after break except for return value. Also it's possible to insert code between yield and break which makes implementation much harder: with my design break pushes value on stack and jumps out of loop, while yield should stores value to local variable and possibly override it if multiple yields are used.

For clarification, I've meant to make loops common expressions that can be assigned to variable, e.g. let idx = for i=0 to ar.Length-1 do .... That's possible even now, but there's no much sense since idx can be only unit (). Assignment is example to show that this is just expression without special rules.

MSIL listing for

let contains value (array: int[]) =
    for i=0 to array.Length-1 do
        if ar.[i] = value then
            break true
    else
        false
IL_0000: ldc.i4.0 // i = 0
IL_0001: stloc.0

IL_0002: br.s IL_001a // goto il_001a
// loop start (head: IL_001a)
    IL_0004: ldarg.2 // ar
    IL_0005: ldloc.0 // i
    IL_0006: ldelem System.Int32 // ar.[i]
    IL_000b: ldarg.1 // value
    IL_000c: bne.un.s IL_0016 // if ar.[i] <> value then goto il_0016
    
    // this is how 'break' is emitted
    IL_000e: ldc.i4.1 // push 'true' on stack 
    IL_0014: br.s IL_0020 // goto il_0020

    IL_0016: ldloc.0 // i
    IL_0017: ldc.i4.1 // 1
    IL_0018: add // i + 1
    IL_0019: stloc.0 // i <- i + 1

    IL_001a: ldloc.0 // i
    IL_001b: ldarg.2 // ar
    IL_001c: ldlen // ar.Length
    IL_001d: conv.i4 // int ar.Length
    IL_001e: blt.s IL_0004 // if i < ar.Length then goto il_0004
// end loop

// else branch of loop
ldc.i4.0 // push 'false'

IL_0020: ret

jl0pd avatar Dec 08 '21 05:12 jl0pd

My apologies; I misunderstood the proposal.

A solution that comes pretty close to what is proposed is already possible today with a computation builder like this one:

type LoopBuilder<'a>(defaultValue: 'a) =
    member _.For(s: _ seq, body) =
        let enumerator = s.GetEnumerator()
        let mutable running = true
        let mutable result = defaultValue
        while running && enumerator.MoveNext() do
            match body (enumerator.Current) with
            | Some value ->
                running <- false
                result <- value
            | None -> ()
        result
    member _.Zero() = None
    member _.Return(value) = Some value
let loop = LoopBuilder


/// indexOf with a default value of -1
let indexOf value (ar: 'a list) =
    loop -1 {
        for i,v in ar |> Seq.indexed do
            if v = value then
                return i
    }

// Test
[1;2;3] |> indexOf 5 // -1
[1;2;3] |> indexOf 3 //  2
[1;2;3] |> indexOf 1 //  0

SchlenkR avatar Dec 09 '21 18:12 SchlenkR

CE have it's own overhead. This LoopBuilder for example makes at least 3 allocations and involves virtual calls into FSharpFunc<'a. 'b>

Simple benchmark that shows how much performance we lose: CE version runs ~634ms Let-rec loop runs ~95 ms F#-style loop runs ~71 ms Handwritten IL runs ~24 ms System.Array.IndexOf runs ~15 ms

Using break and emitting semi-optimal code can give us ~3x faster code in loops in this case

benchmark code
open System
open System.Reflection.Emit

module IlIndexOf =
    let ilIndexOfImpl =
        let method = System.Reflection.Emit.DynamicMethod("indexOf", typeof<int>, [|typeof<int>; typeof<int[]>|])
        let il = method.GetILGenerator()
        il.DeclareLocal(typeof<int>) |> ignore
        il.Emit(OpCodes.Ldc_I4_0)
        il.Emit(OpCodes.Stloc_0)
        let ret = il.DefineLabel()
        let loopCond = il.DefineLabel()
        il.Emit(OpCodes.Br_S, loopCond)
        do
            let loopStart = il.DefineLabel()
            il.MarkLabel(loopStart)

            // if ar[i] <> value then goto incrIdx
            il.Emit(OpCodes.Ldarg_1)
            il.Emit(OpCodes.Ldloc_0)
            il.Emit(OpCodes.Ldelem_I4)
            il.Emit(OpCodes.Ldarg_0)
            let incrIdx = il.DefineLabel()
            il.Emit(OpCodes.Bne_Un, incrIdx)

            // break i
            il.Emit(OpCodes.Ldloc_0)
            il.Emit(OpCodes.Br, ret)

            // i <- i + 1
            il.MarkLabel(incrIdx)
            il.Emit(OpCodes.Ldloc_0)
            il.Emit(OpCodes.Ldc_I4_1)
            il.Emit(OpCodes.Add)
            il.Emit(OpCodes.Stloc_0)

            // while i < ar.Length do
            il.MarkLabel(loopCond)
            il.Emit(OpCodes.Ldloc_0)
            il.Emit(OpCodes.Ldarg_1)
            il.Emit(OpCodes.Ldlen)
            il.Emit(OpCodes.Conv_I4)
            il.Emit(OpCodes.Blt_S, loopStart)

        // else
        il.Emit(OpCodes.Ldc_I4, -1)

        il.MarkLabel(ret)
        il.Emit(OpCodes.Ret)

        method.CreateDelegate<Func<int, int[], int>>()

    let ilIndexOf value (ar: int[]) =
        ilIndexOfImpl.Invoke(value, ar)

let FSStyleIndexOf value (ar: int[]) =
    let mutable result = -1
    let mutable i = 0
    while i < ar.Length && result = -1 do
        if ar[i] = value then
            result <- i
        i <- i + 1
    result

let rec RecursiveIndexOf value (ar: int[]) =
    let rec loop idx =
        if idx >= ar.Length then
            -1
        elif ar[idx] = value then
            idx
        else
            loop (idx + 1)
    loop 0

type LoopBuilder<'a>(defaultValue: int) =
    member _.For(s: _ seq, body) =
        let enumerator = s.GetEnumerator()
        let mutable running = true
        let mutable result = defaultValue
        while running && enumerator.MoveNext() do
            match body (enumerator.Current) with
            | Some value ->
                running <- false
                result <- value
            | None -> ()
        result
    member _.Zero() = None
    member _.Return(value) = Some value
let loop = LoopBuilder

/// indexOf with a default value of -1
let CEindexOf value (ar: int[]) =
    loop -1 {
        let mutable i = 0
        for v in ar do
            if v = value then
                return i
            else
                i <- i + 1
    }

let size = 100_000
let rnd = Random(42)
let data = Array.init size (fun _ -> rnd.Next())

// correctness check
let last = data.[data.Length - 1]
assert (FSStyleIndexOf last data = CEindexOf last data
     && FSStyleIndexOf last data = IlIndexOf.ilIndexOf last data
     && FSStyleIndexOf last data = Array.IndexOf(data, last))

let bench f =
    let sw = System.Diagnostics.Stopwatch.StartNew()
    for _=0 to 1_000 do
        f()
    printfn "ran in %f ms" sw.Elapsed.TotalMilliseconds

bench (fun () -> ())
bench (fun () -> Array.IndexOf(data, size - 1) |> ignore)
bench (fun () -> IlIndexOf.ilIndexOf (size-1) data |> ignore)
bench (fun () -> FSStyleIndexOf (size-1) data |> ignore)
bench (fun () -> RecursiveIndexOf (size-1) data |> ignore)
bench (fun () -> CEindexOf (size-1) data |> ignore)

~~I have a feeling that I've done something wrong in FSStyleIndexOf so it runs that slow or have bug somewhere in ilIndexOf so it runs that fast. But System.Array.IndexOf as baseline indicates that there's actually something in JIT that can optimize simple loops and can't optimize f#-version~~

Methods was generic and equality was done with LanguagePrimitives.HashCompare.GenericEqualityIntrinsic. Updated code

jl0pd avatar Dec 10 '21 05:12 jl0pd

The builder could be optimized by targeting directly array instead of seq (or at least providing a specialized array overload for ‚for‘), by using InlineIfLambda or even by using a resumable state machine. Aside from the performance issues: I don’t think it’s necessary extending the language, since the proposed syntax can be expressed using CE quite closely.

SchlenkR avatar Dec 13 '21 22:12 SchlenkR