protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Span<T> in C# on RepeatedField<T>

Open PascalSenn opened this issue 1 year ago • 4 comments

What language does this apply to? c# protobuf 3

Describe the problem you are trying to solve.

The issue #6217 was recently marked as stale, but the problem it addresses is still relevant. The goal is to work with a Span<T> of a RepeatedField<T> in C# Protobuf, which would allow for working with apis like vectors and reuse of already allocated memory. Accessing the internal array of a RepeatedField<T> would make this possible.

I have read through the discussions on the issue and the PR https://github.com/protocolbuffers/protobuf/pull/6538.

Describe the solution you'd like Ideally, provide access to a ReadOnlySpan<T> of the internal array of a RepeatedField<T>. I understand that modifying the array after creating a span from it could lead to unexpected behavior, but in most cases, incoming messages are not mutated.

Describe alternatives you've considered If providing access to the internal data structures through ReadOnlySpan<T> is not feasible, an alternative solution would be to introduce CopyTo methods that accept a Span<T> as the destination. The proposed method signatures are:

    public void CopyTo(Span<T> destination) {..}
    public void CopyTo(Span<T> destination, int length) {..}
    public void CopyTo(Span<T> destination, int sourceIndex, int destinationIndex, int length) {..}

Additional context Add any other context or screenshots about the feature request here.

PascalSenn avatar May 05 '24 16:05 PascalSenn

I'd want the input of @jamesnk on this (for consistency with other .NET APIs), but I don't think I object to exposing a ReadOnlySpan<T>. Can you create a PR for us to review?

jskeet avatar May 07 '24 15:05 jskeet

@jskeet The objection on ReadOnlySpan<T> in the initial proposal was about the possibility of resizing the underlying array.

https://github.com/protocolbuffers/protobuf/pull/6538#issuecomment-630114989

Also note that exposing the data as read-only (ReadOnlySpan<T>) addresses some concerns over the safety, but there are still some potential issues - e.g. the returned span is bound to the current array, but if a repeated field gets resized, a new internal array is allocated, so the span will point to a dead array and all the changes will be lost.

If exposing ReadOnlySpan<T> is an option. Something simple as this could work: https://github.com/protocolbuffers/protobuf/pull/16772

If you prefer a CopyTo approach, let me know.

PascalSenn avatar May 07 '24 17:05 PascalSenn

@PascalSenn: Yes, basically this ends up being problematic any time things are changed. The use of a span makes it less likely that that will cause problems, as it's harder to keep one around long term. (The code using the span would need to call the code that modifies the repeated field.) That's not true with ReadOnlyMemory<T> though... for the use cases you're thinking of, would just having ReadOnlySpan and not ReadOnlyMemory be sufficient? (We could have a CopyTo method accepting a Memory<T> if that would be useful.)

Personally I'm okay with documentation on the span property that says "if the repeated field is modified, you're in undefined territory in terms of the span".

jskeet avatar May 08 '24 07:05 jskeet

@jskeet I removed the exposed ReadOnlyMemory<T> in the PR and added the note into the XML Doc string of ReadOnlySpan<T>.

I do not require ReadOnlyMemory<T> for my use case as I can also just pass the RepeatedField<T> around and use .Span on it for direct access. (also .Span.CopyTo if needed)

PascalSenn avatar May 08 '24 11:05 PascalSenn

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Aug 07 '24 10:08 github-actions[bot]

@protobuf-github-bot This PR is waiting for a review

PascalSenn avatar Aug 07 '24 17:08 PascalSenn

I'll see if I can find time to have another look at this.

jskeet avatar Aug 07 '24 17:08 jskeet

Although this is an issue, rather than a PR. Is #16772 the right PR to look at?

jskeet avatar Aug 07 '24 17:08 jskeet

Ah - I've already done so and approved. We need someone from the protobuf team to look.

jskeet avatar Aug 07 '24 17:08 jskeet

Just wanted to chime to say thank you and that this is a welcome change.

Unfortunately, it may also be insufficient in some scenarios. RepeatedField<T> can be used for vector embeddings which are float arrays of length up to 3072 or maybe more in the future. Having to treat this type as good ol' List<T> which has to be constructed by allocating a new array instance, and then throwing it away, is not an acceptable tradeoff. There are certain restrictions around UnsafeAccessor for now and, well, it's an UnsafeAccessor, so not having the API to """unsafely""" set array and count directly without risking a breaking change, or tell the generator to pick a better container, results in choosing between significant allocations or more brittle code.

neon-sunset avatar Sep 20 '24 00:09 neon-sunset