fixed/pinned array escapes due to tail call
We have identified a scenario where an assumed pinned array is moved before being copied, resulting in an AccessVioloationException. It can be reproduced with net48, net8.0, and netstandard 2.0 when compiled with Tailcalls true (only tested on Windows), SDK version 9.0.203.
In particular, it escapes from an extension method like this:
static member Set<'T when 'T : unmanaged>(this : IBuffer, src : 'T[]) =
use ptr = fixed src
this.Set(ptr |> NativePtr.toNativeInt, nativeint src.Length * nativeint sizeof<'T>)
You can find a repository with a minimal example here: https://github.com/luithefirst/TailCallEscape
It will crash in about 1 min. Originally, I could also see a TailCallHelperStub stack frame in WinDbg, but it is gone now in my minimal example. Still, if Tailcalls is set to false in the fsproj file, the issue cannot be reproduced.
A colleague also recalled a possible similar issue back in net 5, but I could not find any details on it.
This is where the issue actually occurs in our framework: https://github.com/aardvark-platform/aardvark.rendering/blob/646d165829283a3a0ba03cb4badc465245fbc670/src/Aardvark.Rendering/Resources/Adaptive/ManagedBuffer.fs#L105 https://github.com/aardvark-platform/aardvark.base/blob/5b0b5c0507ca20238899260d2a08d252e6a4f1e8/src/Aardvark.Base.FSharp/Utilities/Interop/FSLibExtensions.fs#L529
There, it is implemented in a netstandard 2.0 library, and when consumed by net48 the issue occurs (but for some reason not in net8.0). Also, when disabling Tailcalls in the library, it can be avoided.
Expected behaviour: The compiler should be aware that a stackframe is pinning an object and therefore should not generate a tailcall.
Workaround: Use GCHandle + AddrOfPinnedObject
That should be relatively easy to fix, we have the "can tail" check in very few places. It should take the pinned info into account
Thanks for your reply, nice to hear!
For clarity, I've also outlined the original scenario where we originally discovered the issue: library with netstandard 2.0 (tailcalls=true) used in net48.
Edit: It also seems to occur in net8.0. Here are the different call stacks:
net48 + tailcall true:
TailCallEscape.Application+Buffer`1[[System.Int32, mscorlib]].Set(IntPtr, IntPtr)
TailCallEscape.Application.run2@62(System.Random, System.Collections.Generic.List`1<Int32[]>, Buffer`1, Microsoft.FSharp.Core.Unit)
net48 + tailcall false:
TailCallEscape.Application+Buffer`1[[System.Int32, mscorlib]].Set(IntPtr, IntPtr)
TailCallEscape.Application+BufferExtensions.Set[[System.Int32, mscorlib]](IBuffer, Int32[])
TailCallEscape.Application.run2@62(System.Random, System.Collections.Generic.List`1<Int32[]>, Buffer`1, Microsoft.FSharp.Core.Unit)
net8.0 + tailcall true:
TailCallEscape.Application+Buffer`1[[System.Int32, System.Private.CoreLib]].Set(IntPtr, IntPtr)
TailCallEscape.Application+Buffer`1[[System.Int32, System.Private.CoreLib]].TailCallEscape.Application.IBuffer.Set(IntPtr, IntPtr)
TailCallEscape.Application.run2@62(System.Random, System.Collections.Generic.List`1<Int32[]>, Buffer`1, Microsoft.FSharp.Core.Unit)
Hello,
I've found a simpler repro for this issue. It seems like the fixed statement is broken in general with tail calls:
namespace App
open System
open FSharp.NativeInterop
module Program =
let mutable dummy : int[] = null
let mutable data : int[] = null
let foo (address: nativeint) =
dummy <- null
GC.Collect()
if address <> NativePtr.toNativeInt &&data.[0] then
failwithf "Address changed"
let bar() =
use ptr = fixed data
foo (NativePtr.toNativeInt ptr)
[<EntryPoint>]
let main _argv =
dummy <- Array.zeroCreate<int> 100
data <- Array.zeroCreate<int> 10
bar()
0
An easy workaround is to wrap the last expression to avoid the tail call:
match (foo (NativePtr.toNativeInt ptr)) with
| () -> ()
Tested with .NET SDK 8.0.411 and 9.0.300.
I wonder how the IL spec looks at it - but we are likely the only compiler running into that I assume.
i.e. would code that uses .tail before working with a pinned variable be treated as invalid IL ?
(and compiler should not generate that .tail here at all)
Oh, it's tail prefixes and in both runtimes. Must've been there forever. I wonder what does the ECMA spec say about it (if anything at all). But shouldn't change the fact that it should be easy to not add prefix if there are any pinned locals. It might, however, theoretically, result in SO for some code which was working before.
@jakobbotsch:
Hi Jakob, could you please help us out with identifying the right behavior?
We have a combination of a pinned local, a tail. instruction, and then a call to a method where the pinned local is passed. See the link below. With the tail., the pinning does not apply and address changes in this small repro. Without it, pinning works fine.
Question is:
- Is this invalid IL, and
.tailmust not be used? - Is this part of the IL spec to describe how
.tailandpinnedshould work together, if at all? - Can this be a runtime bug?
https://sharplab.io/#v2:DYLgZgzgNALiBOBXAdgH2QQwLYFMIAcMBjHAAgEF98BYAKDoHt8dlSBlATwhhy0edYAxNgAsM8fADoAchhgBLAG44Aksh7wmdOlgYATRMDIAFTQHN42UgF46pe6SMxSWRDAwAjI6QNYsHUhBSeXUAbQBdG1JkQ2A7BycXN09vPTkMQOCwyOto2O1aB0ccZzAGBlIACgw9PXg8CCDMBWUQmABKG3ii+19/UgAeAFo84DjCnvsAcQBhSRmGMZwiGErOgHp10gAVEXkIaIZnDGAAdwwuHwY8UhgRMhh4eSIAayhSDzdb+A4Qs0d5C8yABmW7yXAHMwlA7yZwwCpESwQEQFSb2eRgUg1OoNQYAPlIshaOGMj0k8KJSlU6lIADJaWl3JJQgAGSJ3FjdNH2MAYeRnWEiTEAInItXqEAORDEyChemFqISJQ+4jWXQmk0QEDI+EeUTA8gAHjg9D50lyimUKpVKcpSfByQxbdTnLr4O0LQ41p77JtSABVZBEBh+Fhw+5YjwMZQ8hjwLEuHD1UgAAzWKYcIzuclI5wC5L5wGCB2QR1IvFhJsV9lCAwAouofsYGG08eEuYksHzWAB9cRmRTq7l9ALDCjwSwcSQALyTDBm9TkOAGrdIAEYWSyfWb3IMRuQJxcZ3OFzglyv1ASN9vtx5VR6NT0WUA
C# cannot express the .tail emission, the relevant block of the usage would be roughly this:
fixed (int* ptr = &array[0])
{
return foo((nint)ptr); //.tail call
}
- Is this invalid IL, and
.tailmust not be used?
It is not invalid IL, but locals will not be pinned in the callee if it is tail. prefixed. It is similar to how it is up to the IL generator to ensure no pointer into the local stack frame remains if it emits tail. The IL generator must also ensure that pinning is not necessary inside callees (concretely for F# that would probably mean that the fixed was in some different scope).
- Is this part of the IL spec to describe how
.tailandpinnedshould work together, if at all?
It doesn't seem to be very clearly spelled out, but II.7.1.2 says
While a method with a pinned local variable is executing, the VES shall not relocate the object to which the local refers.
while tail. is described as
tail. (prefix) – call terminates current method
which I think covers what happens here.
- Can this be a runtime bug?
I think the behavior is expected. The runtime is required to honor tail. when possible and it does so here. It is up to the IL generator to only emit tail. when removing the current stack frame is ok (i.e. it should be ok to unpin and there shouldn't be addresses to current locals).
So for situations like the one here, it would be up to the IL Generator (F#) to not emit .tail when the last call happens to make use of a pinned local.
Which in theory could come with performance implications for existing code (or F# compiler would need to do a tail call with a rewrite on the frontend side ?), especially if it would be a deeply recursive one (not the one in this issue)
I've run into what seems to be another instance of this bug, this time without using fixed:
open Microsoft.FSharp.NativeInterop
open System.Runtime.CompilerServices
module Program =
[<MethodImpl(MethodImplOptions.NoInlining)>]
let bar (pValue: nativeptr<int>) : unit =
let value = NativePtr.read pValue
printfn "value = %A" value
[<MethodImpl(MethodImplOptions.NoInlining)>]
let foo() =
let mutable value = 42
bar &&value
[<EntryPoint>]
let main argv =
foo()
0
This will not print 42.
Any progress on this? This is pretty nasty for us, since we use these kind of constructs quite often.
Hey, I ran into the same issue with fixed. This is quite severe since i use it for pinning byrefs a lot and the workaround is very tedious and inefficient for byrefs since i need to alloc an array, pin it using GCHandle, unpin it in a finally block and write the result back to the byref.
Any chance that this can be avoided by adding an empty finally block?
I think the best fix in this case would be to eliminate .tail alltogether in this scenario.
Btw @hyazinthh @luithefirst the workaround like:
use ptr = fixed arr
try doStuff ptr
finally ()
seems to properly prevent tail-calls also in release builds. I'm not quite sure if the pinning should be inside the try or before but it seems to fix my problem this way at least.
So the finally prevents the call from becoming tail and obviously the compiler optimization doesn't remove the block in my scenarios
Fixed at https://github.com/dotnet/fsharp/pull/18893
Fixed at #18893
Does this also fix the case when taking the address of a mutable variable without using fixed? The unit test does not seem to cover this at least.
@hyazinthh since you can now use fixed with byrefs it would probably be better to rewrite the code accordingly? It should behave pretty much identical?
The && is unsafe and the compiler warns about it.