fsharp icon indicating copy to clipboard operation
fsharp copied to clipboard

fixed/pinned array escapes due to tail call

Open luithefirst opened this issue 7 months ago • 2 comments

We have identified a scenario where an assumed pinned array is moved before being copied, resulting in an AccessVioloationException. It can be reproduced with net48, net8.0, and netstandard 2.0 when compiled with Tailcalls true (only tested on Windows), SDK version 9.0.203.

In particular, it escapes from an extension method like this:

static member Set<'T when 'T : unmanaged>(this : IBuffer, src : 'T[]) =
    use ptr = fixed src
    this.Set(ptr |> NativePtr.toNativeInt, nativeint src.Length * nativeint sizeof<'T>)

You can find a repository with a minimal example here: https://github.com/luithefirst/TailCallEscape

It will crash in about 1 min. Originally, I could also see a TailCallHelperStub stack frame in WinDbg, but it is gone now in my minimal example. Still, if Tailcalls is set to false in the fsproj file, the issue cannot be reproduced.

A colleague also recalled a possible similar issue back in net 5, but I could not find any details on it.

This is where the issue actually occurs in our framework: https://github.com/aardvark-platform/aardvark.rendering/blob/646d165829283a3a0ba03cb4badc465245fbc670/src/Aardvark.Rendering/Resources/Adaptive/ManagedBuffer.fs#L105 https://github.com/aardvark-platform/aardvark.base/blob/5b0b5c0507ca20238899260d2a08d252e6a4f1e8/src/Aardvark.Base.FSharp/Utilities/Interop/FSLibExtensions.fs#L529

There, it is implemented in a netstandard 2.0 library, and when consumed by net48 the issue occurs (but for some reason not in net8.0). Also, when disabling Tailcalls in the library, it can be avoided.

Expected behaviour: The compiler should be aware that a stackframe is pinning an object and therefore should not generate a tailcall.

Workaround: Use GCHandle + AddrOfPinnedObject

luithefirst avatar Jun 12 '25 19:06 luithefirst

That should be relatively easy to fix, we have the "can tail" check in very few places. It should take the pinned info into account

vzarytovskii avatar Jun 13 '25 20:06 vzarytovskii

Thanks for your reply, nice to hear!

For clarity, I've also outlined the original scenario where we originally discovered the issue: library with netstandard 2.0 (tailcalls=true) used in net48.

Edit: It also seems to occur in net8.0. Here are the different call stacks:

net48 + tailcall true:

TailCallEscape.Application+Buffer`1[[System.Int32, mscorlib]].Set(IntPtr, IntPtr)
TailCallEscape.Application.run2@62(System.Random, System.Collections.Generic.List`1<Int32[]>, Buffer`1, Microsoft.FSharp.Core.Unit)

net48 + tailcall false:

TailCallEscape.Application+Buffer`1[[System.Int32, mscorlib]].Set(IntPtr, IntPtr)
TailCallEscape.Application+BufferExtensions.Set[[System.Int32, mscorlib]](IBuffer, Int32[])
TailCallEscape.Application.run2@62(System.Random, System.Collections.Generic.List`1<Int32[]>, Buffer`1, Microsoft.FSharp.Core.Unit)

net8.0 + tailcall true:

TailCallEscape.Application+Buffer`1[[System.Int32, System.Private.CoreLib]].Set(IntPtr, IntPtr)
TailCallEscape.Application+Buffer`1[[System.Int32, System.Private.CoreLib]].TailCallEscape.Application.IBuffer.Set(IntPtr, IntPtr)
TailCallEscape.Application.run2@62(System.Random, System.Collections.Generic.List`1<Int32[]>, Buffer`1, Microsoft.FSharp.Core.Unit)

luithefirst avatar Jun 14 '25 08:06 luithefirst

Hello,

I've found a simpler repro for this issue. It seems like the fixed statement is broken in general with tail calls:

namespace App

open System
open FSharp.NativeInterop

module Program =
    let mutable dummy : int[] = null
    let mutable data : int[] = null

    let foo (address: nativeint) =
        dummy <- null
        GC.Collect()

        if address <> NativePtr.toNativeInt &&data.[0] then
            failwithf "Address changed"

    let bar() =
        use ptr = fixed data
        foo (NativePtr.toNativeInt ptr)

    [<EntryPoint>]
    let main _argv =
        dummy <- Array.zeroCreate<int> 100
        data <- Array.zeroCreate<int> 10
        bar()
        0

An easy workaround is to wrap the last expression to avoid the tail call:

match (foo (NativePtr.toNativeInt ptr)) with
| () -> ()

Tested with .NET SDK 8.0.411 and 9.0.300.

hyazinthh avatar Jun 20 '25 16:06 hyazinthh

I wonder how the IL spec looks at it - but we are likely the only compiler running into that I assume. i.e. would code that uses .tail before working with a pinned variable be treated as invalid IL ?

(and compiler should not generate that .tail here at all)

T-Gro avatar Jun 23 '25 08:06 T-Gro

Oh, it's tail prefixes and in both runtimes. Must've been there forever. I wonder what does the ECMA spec say about it (if anything at all). But shouldn't change the fact that it should be easy to not add prefix if there are any pinned locals. It might, however, theoretically, result in SO for some code which was working before.

vzarytovskii avatar Jun 23 '25 10:06 vzarytovskii

@jakobbotsch:

Hi Jakob, could you please help us out with identifying the right behavior?

We have a combination of a pinned local, a tail. instruction, and then a call to a method where the pinned local is passed. See the link below. With the tail., the pinning does not apply and address changes in this small repro. Without it, pinning works fine.

Question is:

  • Is this invalid IL, and .tail must not be used?
  • Is this part of the IL spec to describe how .tail and pinned should work together, if at all?
  • Can this be a runtime bug?

https://sharplab.io/#v2:DYLgZgzgNALiBOBXAdgH2QQwLYFMIAcMBjHAAgEF98BYAKDoHt8dlSBlATwhhy0edYAxNgAsM8fADoAchhgBLAG44Aksh7wmdOlgYATRMDIAFTQHN42UgF46pe6SMxSWRDAwAjI6QNYsHUhBSeXUAbQBdG1JkQ2A7BycXN09vPTkMQOCwyOto2O1aB0ccZzAGBlIACgw9PXg8CCDMBWUQmABKG3ii+19/UgAeAFo84DjCnvsAcQBhSRmGMZwiGErOgHp10gAVEXkIaIZnDGAAdwwuHwY8UhgRMhh4eSIAayhSDzdb+A4Qs0d5C8yABmW7yXAHMwlA7yZwwCpESwQEQFSb2eRgUg1OoNQYAPlIshaOGMj0k8KJSlU6lIADJaWl3JJQgAGSJ3FjdNH2MAYeRnWEiTEAInItXqEAORDEyChemFqISJQ+4jWXQmk0QEDI+EeUTA8gAHjg9D50lyimUKpVKcpSfByQxbdTnLr4O0LQ41p77JtSABVZBEBh+Fhw+5YjwMZQ8hjwLEuHD1UgAAzWKYcIzuclI5wC5L5wGCB2QR1IvFhJsV9lCAwAouofsYGG08eEuYksHzWAB9cRmRTq7l9ALDCjwSwcSQALyTDBm9TkOAGrdIAEYWSyfWb3IMRuQJxcZ3OFzglyv1ASN9vtx5VR6NT0WUA

C# cannot express the .tail emission, the relevant block of the usage would be roughly this:

                    fixed (int* ptr = &array[0])
                    {
                        return foo((nint)ptr); //.tail call
                    }
                    

T-Gro avatar Jun 23 '25 10:06 T-Gro

  • Is this invalid IL, and .tail must not be used?

It is not invalid IL, but locals will not be pinned in the callee if it is tail. prefixed. It is similar to how it is up to the IL generator to ensure no pointer into the local stack frame remains if it emits tail. The IL generator must also ensure that pinning is not necessary inside callees (concretely for F# that would probably mean that the fixed was in some different scope).

  • Is this part of the IL spec to describe how .tail and pinned should work together, if at all?

It doesn't seem to be very clearly spelled out, but II.7.1.2 says

While a method with a pinned local variable is executing, the VES shall not relocate the object to which the local refers.

while tail. is described as

tail. (prefix) – call terminates current method

which I think covers what happens here.

  • Can this be a runtime bug?

I think the behavior is expected. The runtime is required to honor tail. when possible and it does so here. It is up to the IL generator to only emit tail. when removing the current stack frame is ok (i.e. it should be ok to unpin and there shouldn't be addresses to current locals).

jakobbotsch avatar Jun 23 '25 10:06 jakobbotsch

So for situations like the one here, it would be up to the IL Generator (F#) to not emit .tail when the last call happens to make use of a pinned local.

Which in theory could come with performance implications for existing code (or F# compiler would need to do a tail call with a rewrite on the frontend side ?), especially if it would be a deeply recursive one (not the one in this issue)

T-Gro avatar Jun 23 '25 10:06 T-Gro

I've run into what seems to be another instance of this bug, this time without using fixed:

open Microsoft.FSharp.NativeInterop
open System.Runtime.CompilerServices

module Program =

    [<MethodImpl(MethodImplOptions.NoInlining)>]
    let bar (pValue: nativeptr<int>) : unit =
        let value = NativePtr.read pValue
        printfn "value = %A" value

    [<MethodImpl(MethodImplOptions.NoInlining)>]
    let foo() =
        let mutable value = 42
        bar &&value

    [<EntryPoint>]
    let main argv =
        foo()
        0

SharpLab

This will not print 42.

Any progress on this? This is pretty nasty for us, since we use these kind of constructs quite often.

hyazinthh avatar Aug 01 '25 10:08 hyazinthh

Hey, I ran into the same issue with fixed. This is quite severe since i use it for pinning byrefs a lot and the workaround is very tedious and inefficient for byrefs since i need to alloc an array, pin it using GCHandle, unpin it in a finally block and write the result back to the byref.

Any chance that this can be avoided by adding an empty finally block?

krauthaufen avatar Sep 05 '25 06:09 krauthaufen

I think the best fix in this case would be to eliminate .tail alltogether in this scenario.

T-Gro avatar Sep 05 '25 08:09 T-Gro

Btw @hyazinthh @luithefirst the workaround like:

use ptr = fixed arr
try doStuff ptr
finally ()

seems to properly prevent tail-calls also in release builds. I'm not quite sure if the pinning should be inside the try or before but it seems to fix my problem this way at least.

So the finally prevents the call from becoming tail and obviously the compiler optimization doesn't remove the block in my scenarios

krauthaufen avatar Sep 05 '25 09:09 krauthaufen

Fixed at https://github.com/dotnet/fsharp/pull/18893

T-Gro avatar Sep 22 '25 09:09 T-Gro

Fixed at #18893

Does this also fix the case when taking the address of a mutable variable without using fixed? The unit test does not seem to cover this at least.

hyazinthh avatar Sep 22 '25 16:09 hyazinthh

@hyazinthh since you can now use fixed with byrefs it would probably be better to rewrite the code accordingly? It should behave pretty much identical?

The && is unsafe and the compiler warns about it.

krauthaufen avatar Sep 23 '25 05:09 krauthaufen