csharplang allow references to structs to be stored in fields of ref structs

allow references to structs to be stored in fields of ref structs

Open lucasmeijer opened this issue 6 years ago • 69 comments

Unity is a game engine that uses C# extensively.

We're in need of a way to have a struct that contains one or more pointers to other structs that we want to be able to read/write from/to. We do this with unsafe code today:

    struct Color { float r,g,b; }
    
    unsafe struct Group
    {
      Color* _color;
      ref Color color => ref *_color;
    }

But we would love to do this without unsafe code, and would like to suggest a feature where it's allowed to store references to structs in fields of ref structs, that can guarantee that these pointers don't escape onto the heap:

    struct Color { float r,g,b; }

    ref struct Group
    {
        ref Color color;
    }

EDIT link to spec for this feature https://github.com/dotnet/csharplang/blob/main/proposals/low-level-struct-improvements.md

Nov 25 '17 23:11 lucasmeijer

I feel like you're trying too hard to use ref struct everywhere. Always keep in mind:

Nov 26 '17 11:11 Joe4evr

@Joe4evr I don't think that comment is necessarily apropos.

Nov 26 '17 15:11 jnm2

ref fields are not directly expressible in IL.

You can very close to the desired behavior if there is a public type like: https://github.com/dotnet/corert/blob/796aeaa64ec09da3e05683111c864b529bcc17e8/src/System.Private.CoreLib/src/System/ByReference.cs

The bigger problem (and one of the reasons the type is not public) is tracking of life times. It is assumed right now that that methods that take a ref and return a ref-like cannot return that same ref wrapped in a ref-like. Allowing/assuming such wrapping would make use of ref and ref-likes very difficult.

Such wrapping could be ok though, if wrapped ref is guaranteed to refer to a heap location. There are ideas how this could be allowed via some specialized syntax like:

ByReference<int> r = ref inArray[0];  // ok since array element is definitely on the heap

Just wondering, if the feature comes with "only on the heap" restriction, would it be acceptable limitation?

Nov 27 '17 04:11 VSadov

You can very close to the desired behavior if there is a public type like:

That is possible but we intend to use this for simplifying our API's significantly for our upcoming new component system. We want the syntax to be straightforward, and typing .Value is quite annoying for what is a very common pattern in that Component system.

For our use case:

The referenced data is allocated from C++ via malloc
Group in the above example is a ref struct, thus can't be written to any classes or other persistent data

For our particular use case, we can live with any restriction on what you can assign to it. Internally we are patching a pointer to the data in by writing to memory at an offset relative to the adress of the Group struct.

But certainly for the feature to be useful in general beyond our use case it seems useful to be able to assign to the ref fields from the constructor

Nov 27 '17 15:11 joeante

Stack-only structs were introduced in C# 7.2. See https://github.com/dotnet/csharplang/issues/666

Nov 27 '17 22:11 gafter

This (a ref stored in a struct) would need to be supported by the CLR before/when it could be supported by C#.

Nov 27 '17 22:11 gafter

Can you expand the sample to include a couple of other use cases? In particular:

How do you imagine such a struct being initialized? Sample code would be helpful here.
Do you need this feature for only unmanaged types (transitively contains no class or interface fields)?

Dec 01 '17 00:12 jaredpar

Our full example usage code looks like this:


struct BoidData : IComponentData
{
    public float3 Position;
}

struct MyTransform : IComponentData
{
    public float3 Position;
}

class MyComponentSystem : ComponentSystem
{
    ref struct MyGroup
    {
        public ref BoidData       Boid;
        public ref MyTransform    Transform;
    }

    ComponentGroup<MyGroup> m_Group = new ComponentGroup<MyGroup>();

    void OnUpdate()
    {
        foreach(var entity in m_Group)
        {
            entity.Transform.Position = entity.Boid.Position;
        }
    }
}

The expected behaviour would be to transform this:

ref struct MyGroup
{
        public ref BoidData       Boid;
        public ref MyTransform    Transform;
}

to this:

ref struct MyGroup
{
    	private Boid* 			        _boid;
    	public ref Boid                 Boid { { get ref *_boid;  }

    	private MyTransform* 	               _transform;
    	public ref MyTransform  Transform { { get ref *_transform;  }
}

How do you imagine such a struct being initialized? Sample code would be helpful here.

For our specific use case we do not require initializing ref values on the struct. We are writing to the pointers by offset using unsafe code inside the enumerator that GetEntities returns.

That is our use case. I can however imagine that being able to assign the ref in the constructor would cover other use cases for this.

Do you need this feature for only unmanaged types (transitively contains no class or interface >fields)?

Our use case is where the ref fields to Boid and MyTransform above are blittable. Thats our current requirements. We probably don't want to support interfaces / class types inside of Boid and MyTransform in the example above for other reasons. Not 100% on this, but if thats a hard requirement for some reason, we will find a way to live with it.

Dec 01 '17 01:12 joeante

I would imagine that for other use cases allowing assignment to ref fields from the constructor would be the sensible thing to do. This matches how it works in C++.

ref struct MyGroup
{
        MyGroup(ref MyTransform t) { Transform = t; }
    	private ref MyTransform 	     Transform;
}

@benaadams since you emoticon loved it, I imagine you have some use cases for this as well?

Dec 01 '17 01:12 joeante

Just wondering, if the feature comes with "only on the heap" restriction, would it be acceptable limitation?

Yes

I imagine you have some use cases for this as well?

Gather ops (returning multiple refs); side car data (ref + info fields)

Dec 01 '17 13:12 benaadams

For our specific use case we do not require initializing ref values on the struct. We are writing to the pointers by offset using unsafe code inside the enumerator that GetEntities returns.

This feature request started with an ask that the feature be usable in safe code. This design seems to require there is an unsafe creator behind the scenes newing up these values.

Dec 05 '17 05:12 jaredpar

This was mainly what I was hoping for when I heard about 7.2 ref struct. I thought ref fields was the main reason to introduce ref structs in C# 7.2.

Dec 09 '17 11:12 AJosephsen

This was mainly what I was hoping for when I heard about 7.2 ref struct. I thought ref fields was the main reason to introduce ref structs in C# 7.2.

The same for me. I planned to use them as wrappers for Spans.

Dec 09 '17 14:12 omariom

The same for me. I planned to use them as wrappers for Spans.

You can do that as the ref field is hidden in the Span e.g.

Invalid

struct Thing
{
    Span<byte> span;
}

Valid

ref struct Thing
{
    Span<byte> span;
}

Is bit wasteful for a Span of count 1; also introduces variability to the field (as could now be count > 1)

Dec 09 '17 14:12 benaadams

@benaadams

Valid

Great! 👍

Dec 09 '17 14:12 omariom

@jaredpar

The requirement to initialize such fields could make it safe: Foe example,

ref struct MessageWrapper
{
    private ref Header header;
    private Span<byte> body; a

    public MessageWrapper(ref Header header, Span<byte> body)
    {
        this.header = ref header;
        this.body = body;
    }
}

So MessageWrappers must be initialized either via excplicit ctor call or if the ref fields are public via initializer. Parameterless ctors are disallowed.

ref var wrapper = new MessageWrapper { Header = ref header, Body = body };

Dec 09 '17 14:12 omariom

The requirement to initialize such fields could make it safe:

Not really:

MessageWrapper Create()
{
    var header = CreateHeader();
    return new MessageWrapper(ref header, CreateBody());
}

// call this method and watch bad things happen
void Crash() {
    var w = Create();
    // do something with w.header here
}

To make that safe the language/runtime would need to recognize that header needs to be in a pinned location with a lifetime at least as long as the MessageWrapper object. You can play with this sort of thing today via a Span<T> pointer trick:

ref struct MessageWrapper
{
    private Span<Header> header;
    private Span<byte> body;

    public MessageWrapper(ref Header header, Span<byte> body)
    {
        this.header = SpanEx.DangerousCreate(ref header);
        this.body = body;
    }
}

public static class SpanEx
{
    /// <summary>
    /// Creates a new span over the target. It is up to the caller to ensure
    /// the target is pinned and that the lifetime of the span does not escape
    /// the lifetime of the target.
    /// </summary>
    /// <param name="target">A reference to a blittable value type.</param>
    public static unsafe Span<T> DangerousCreate<T>(ref T target) where T : struct
    {
        return new Span<T>(Unsafe.AsPointer(ref target), 1);
    }
}

You can move the unsafe code to a second assembly and mark the one containing MessageWrapper safe, as long as Header is not itself a ref struct (otherwise you cannot use it as a generic parameter and this safety illusion breaks down). And then the lifetime of MessageWrapper must be carefully managed (it cannot for example be returned if Header is a local). Most of the relevant restrictions are enforced by the compiler by virtue of it being a ref struct.

Wrapping the "ref field" in a Span<T> is a tiny bit goofy (a ByReference type or actual support would be more natural and efficient), but there are some neat tricks you can do once you have a span such as reinterpret internal structure as a Vector<T>.

I entered https://github.com/dotnet/corefx/issues/26228 to make this a little nicer (don't expect much though, this sort of thing plays very loose with the safety boundaries of managed code). I sort of think requiring the unsafe flag might be reasonable for playing with Span<T> objects of blittable types.

Jan 09 '18 16:01 bbarry

To make that safe the language/runtime would need to recognize that header needs to be in a pinned location with a lifetime at least as long as the MessageWrapper object.

You mean like this: https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md ?

Jan 09 '18 17:01 gafter

I am not sure if that is enough because Header here is not a ref struct (and ref struct types cannot be held inside a Span<T> as the generic argument, you have to trick it to see the underlying bytes and reinterpret as a not-ref struct). This program fails (every time for me with the above ) with the exception "Header not valid here":

public class Program {
    static MessageWrapper Create()
    {
        var header = CreateHeader();
        header.one = 1;
        var w = new MessageWrapper(ref header, CreateBody());
        if (w.header[0].one != 1) throw new InvalidOperationException("Header didn't make it");
        return w;
    }

    static Span<byte> CreateBody() => default;
    static Header CreateHeader() => default;

    public static void Main()
    {
        var w = Create();
        // anything that causes a GC event or sufficiently 
        // modifies the stack should cause breakage here?
        Thread.Sleep(1); 
        // do something with w.header here
        if (w.header[0].one != 1) throw new InvalidOperationException("Header not valid here");
    }
}

public ref struct MessageWrapper
{
    public Span<Header> header;
    public Span<byte> body;

    public MessageWrapper(ref Header header, Span<byte> body)
    {
        this.header = SpanEx.DangerousCreate(ref header);
        this.body = body;
    }
}

public static class SpanEx
{
    /// <summary>
    /// Creates a new span over the target. It is up to the caller to ensure
    /// the target is pinned and that the lifetime of the span does not escape
    /// the lifetime of the target.
    /// </summary>
    /// <param name="target">A reference to a blittable value type.</param>
    public static unsafe Span<T> DangerousCreate<T>(ref T target) where T : struct
    {
        return new Span<T>(Unsafe.AsPointer(ref target), 1);
    }
}

public struct Header {
    public byte one;
}

Jan 09 '18 17:01 bbarry

How about creating a "RefWrapper" or a "SingleSpan" ref struct which is almost same as Span but doesn't have a length field?

Jan 10 '18 15:01 aobatact

The reason the Span<T> type is useful is because it is public and already has the associated apis for Span<T>. Any other name might as well be the ByReference type. This type already exists and represents the functionality desired in this issue, but is internal because it is very easy to use it to do things that break assumptions we depend on as well as a question of how much support it should get I assume.

I think most of us looking for ref fields would be satisfied if that type was made public and effectively ignored much like how TypedReference and related features are treated. That would still likely mean significant additional work for future changes to the runtime.

Jan 10 '18 15:01 bbarry

@VSadov, what do you say on this? There is no way to add ref-fields directly to ref struct, as the problem of the default instances arises: the default instances of such structs would contain refs to nowhere. Thus not so much ref-fields itself need to be introduced in IL as much some sort of the monad Maybe for refs has to be implemented. if ref-fields are introduced in IL:

public readonly ref struct NullableRef<T>
{
	public NullableRef(ref T r) => _ref = r;
	
	private readonly ref T _ref;
	
	public bool IsNull => Unsafe.AsPointer(ref _ref) == null;
	public ref T Ref => IsNull ? throw new NullReferenceException() : _ref;
}

otherwise:

public readonly ref struct NullableRef<T>
{
	public NullableRef(ref T r) => _ref = new ByReference<T>(ref r);
	
	private readonly ByReference<T> _ref;
	
	public bool IsNull => Unsafe.AsPointer(ref _ref.Value) == null;
	public ref T Ref => IsNull ? throw new NullReferenceException() : _ref.Value;
}

Jun 25 '18 13:06 AleckAgakhan

@joeante wrote:

The referenced data is allocated from C++ via malloc.

If you're willing to switch over from malloc to allocating your structs in large C# arrays, then you could use the solution that I describe in #2417 in order to eliminate frame rate stalls/stutters caused by GC, without resorting to unsafe code.

An additional advantage: When you use malloc, then obviously you must later free the memory manually, but when you allocate your structs in C# arrays (call them "pages"), then each array ("page") is automatically garbage-collected when no more references to that "page" exist, as you know. Thus you can easily manage and balance the costs by merely changing the length of the arrays/pages that you create.

To run GC less often, create longer arrays/pages.
To save memory at the expense of running GC a little more often, create shorter arrays/pages.

In the case of a game where frame rates are important, you'd create big arrays/pages to keep GC to the minimum, or potentially never release the arrays/pages while the user is still playing the game and not between levels etc. You would have the control AND the necessary performance, without resorting to unsafe code.

When allocating a struct, should you search for a free slot in an array/page? If you want to, you can, but note that a simpler solution is satisfactory in many apps: Don't reuse any slots. When the page is full, simply allocate a new page, and the previous page will be automatically garbage-collected when all of its slots become unused. This GC cost is low and completely satisfactory for most apps, but if your game is very sensitive to frame rate stalls, then you could simply increase the page size (array length), and if this GC cost is still not low enough, then you could keep track of which slots are free and reuse slots.

You could start using the solution already today with the current version of C#, but in order to make it a comfortable and convenient solution, C# needs the syntactic sugar that I describe in #2417, which is mostly the same syntax as @lucasmeijer described ("ref Color color;") except designed for pointing to elements of C# arrays instead of unsafe C++ malloc. Also note that lucasmeijer wrote "ref struct Group {...}" but in my proposal, both "Group" and "Color" would be normal structs, whereas the field "ref Color color;" is special.

Aug 08 '18 13:08 verelpode

As I understand it, you would be required to supply the ref from elsewhere in order to initialize a ref field, correct?

Would it be possible to have a struct containing fields itself that are used by ref? For example, something like this, but expressible safely:

unsafe struct Example
{
    Vector3 position;

    Quaternion rotation;

    public ref Vector3 Position => ref *(Vector3*)Unsafe.AsPointer(ref position);

    public ref Quaternion Rotation => ref *(Quaternion*)Unsafe.AsPointer(ref rotation);
}

Which would allow for more convenient syntax, such as:

instance.Position.X += 1;

When using properties.

Dec 05 '18 10:12 DaZombieKiller

@DaZombieKiller Why go through those hoops when you can just do public ref Vector3 Position => ref position;?

Dec 05 '18 13:12 Joe4evr

@Joe4evr A struct member cannot return a reference to this or any of its members. Hence the need for pointers and unsafe code there. Your example would ultimately be the preferred syntax if it were valid though, yes.

Dec 06 '18 09:12 DaZombieKiller

@DaZombieKiller: public ref Vector3 Position => ref (Vector3)Unsafe.AsPointer(ref position);

@Joe4evr: Why [can't you] just do public ref Vector3 Position => ref position;

@DaZombieKiller: A struct member cannot return a reference to this or any of its members.

I did my best to briefly write-up some of these issues in a StackOverflow answer. The relevant point for this exchange is that you can use the System.Runtime.CompilerServices.Unsafe package to round-trip a struct's this pointer through IntPtr in order to work around the (overzealous) fatal compiler errors.

"Because ref struct code that should properly be allowed to return its own this is still prevented by the compiler from doing so, you have to resort to some rather brutish techniques to get around the fatal error(s) when coding within the supposedly liberated ref struct instance methods. To wit, value-type instance methods written in C# that legitimately need to override fatal errors CS8170/CS8157 can opaque the 'this' pointer by round-tripping it through an IntPtr."

Using the As methods--as opposed to IntPtr.ToPointer()--from the aforementioned library should allow you to entirely avoid declaring the unsafe keyword in C# for the Example struct shown above.

Feb 08 '19 00:02 glenn-slayden

The scenarios outlined by @lucasmeijer and @joeante - are in effect a different way of doing interop. This technique (which I also use) makes it easier (by which I mean more intuitive and simpler code) for managed code to access data in unmanaged memory.

Managed and unmanaged code/threads could access the same memory region in a very natural way by leveraging ref fields.

Recent enhancements to ref have had a big effect on simplifying (and thus reducing scope for error) some of my own low level code, so much so that what used to be done in C is now being moved straight to C#.

I should stress that this is only sought for unmanaged structs in my case.

Apr 14 '19 17:04 Korporal

As someone suggested, having ref field is very similar to having Span of fixed size one. It seems to me that main problem is the safe initialization of Span with reference to a field, correct?

So far Span can be initialized only with:

interior pointer of managed array (+length)
stackalloc (implicit length)
unsafe pointer (+length)

Initialization of Span with ref field could be done like this:

when field is on heap - use interior pointer of managed object (same as managed array init)
when field is on stack - use same initialization as stackalloc I think same rules applies as for stackalloc. Since Span is on stack, we can guarantee Span gets "released" first before the data it references are "released".

Now, is it possible to determine whether ref field points to stack or heap?

Sep 24 '19 14:09 OndrejPetrzilka

As someone suggested, having ref field is very similar to having Span of fixed size one.

It's similar enough that conceptually you can think of them as equivalent. From a language perspective though there are some subtle differences that need to be dug into. Mostly around the "safe to escape" rules. Those currently treat ref struct and ref differently hence need to stare at those for a few hours to see if they're truly the same here.

Initialization of Span with ref field could be done like this

I think you're approach the problem from the wrong angle here. The key is not verifying the life time of the value the ref field points to but instead ensuring that the ref field is always assigned. Once we know that it's always assigned, given a set of input expressions, then we can calculate the lifetime of the field and it's container using the existing rules.

For the most part I think this is pretty straight forward. A struct with a ref filed has virtually all the same limitations as a ref struct: can't be used as generic argument, can't be the element type of an array, etc ... Hence you're mostly left with the following cases:

Instantiating a struct with a default ctor and ensuring all it's fields are assigned via object initializer
Custom ctor needs to assign ref fields with a value that is safe to escape from the constructor. This isn't a new rule but just application of existing rules to the new construct.

Sep 24 '19 17:09 jaredpar

csharplang csharplang copied to clipboard

allow references to structs to be stored in fields of ref structs

csharplang
csharplang copied to clipboard