protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Adds UnsafeCollectionOperations for unsafe access to RepeatedField<T>

Open PascalSenn opened this issue 1 year ago • 26 comments

This is a proposal to add UnsafeCollectionOperations for fast access on RepeatedField<T> https://github.com/protocolbuffers/protobuf/issues/16745

PascalSenn avatar May 07 '24 17:05 PascalSenn

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar May 07 '24 17:05 google-cla[bot]

@JamesNK We're aware that if the repeated field is modified while the span is "active", it may or may not reflect the changes (e.g. add/remove operations will not be seen in terms of the length, and if the array is reallocated any further updates won't be visible at all) - but do you think that's reasonable behavior, so long as we document that appropriately? (I'd just say that the state of the span is undefined after any modifications.) Any other thoughts about this PR?

jskeet avatar May 10 '24 15:05 jskeet

Modification is the problem. Sometimes the span reflects changes, sometimes it doesn't.

I like the idea, but I don't think it should be a property on RepeatedField<T>. It makes the dangerous behavior too easy to access. I suggest doing what the BCL does with List<T> + span and make it a static method on an "advanced" type.

https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.collectionsmarshal.asspan?view=net-8.0

A property is easily discoverable with autocomplete. On the other hand, a method on a type is something people have to go looking to use, and they'll hopefully read the docs about why they should (or shouldn't) use the method.

foreach (var product in CollectionsMarshal.AsSpan(response.Products))
{
    // do stuff
}

I took a look at RepeatedField<T> source code and setting values does nothing more than add the value to the array (with various safety checks). The AsSpan method could return a Span<T> so it can also be used as a fast way to modify the repeated field as well.

var request = new Request();
CollectionsMarshal.SetCount(request.Products, count: 1000);
var productsSpan = CollectionsMarshal.AsSpan(request.Products);

for (var i = 0; i < productsSpan.Length; i++)
{
    productsSpan[i] = // do stuff
}

This would save many internal array allocations and copies as the repeated field grows.

However, SetCount (a new version for repeated fields) + modifying the span directly wouldn't guarantee the values in the repeated field aren't null.

JamesNK avatar May 10 '24 23:05 JamesNK

@JamesNK @jskeet

I changed the implementation to include your suggestions. There is already a similar class UnsafeByteOperations, so I aligned the naming with it -> UnsafeCollectionOperations. The implementation of the class is very close to the CollectionsMarshal.cs from System.Runtime.InteropServices. I implemented SetCount and AsSpan. I also ported the tests over from CollectionsMarshalTests.cs.

PascalSenn avatar May 12 '24 12:05 PascalSenn

@jskeet Most of this implementation closely aligned with CollectionsMarshal.cs from the runtime. The goal was to keep the implementation and behavior relatively consistent.

I also believe a few things were performance optimizations in the CollectionsMarshal. For example, the aggressive inlining and the direct field access.

I incorporated your feedback into this pull request to better align with the behavior of Protobuf. I left a couple of comments open to provide some context. Feel free to close them as needed after review, as everything should be resolved.

PascalSenn avatar May 13 '24 10:05 PascalSenn

@jskeet One last remark. The tests are ported from CollectionsMarshalTests.cs to ensure consistent behavior: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Runtime.InteropServices/tests/System.Runtime.InteropServices.UnitTests/System/Runtime/InteropServices/CollectionsMarshalTests.cs

PascalSenn avatar May 13 '24 10:05 PascalSenn

I'll have a look tomorrow. I would say that I'm not as interested in alignment with CollectionsMarshal as with consistency with the rest of this codebase, and clarity of code and usage.

jskeet avatar May 13 '24 10:05 jskeet

cc @amanda-tarafa for any additional thoughts

jskeet avatar May 15 '24 09:05 jskeet

@JamesNK @jskeet @amanda-tarafa I think i resolved all the suggestions. Feel free to have another look

PascalSenn avatar May 17 '24 13:05 PascalSenn

Thanks - I'm afraid at this point I won't get to this until Monday, but hopefully I'll be able to look then.

jskeet avatar May 17 '24 14:05 jskeet

@jskeet Comments are resolved šŸš€

PascalSenn avatar May 20 '24 08:05 PascalSenn

@JamesNK: Could you take one final pass at this? It looks good to me, but due to the unsafe aspects of it, I'd appreciate a double-check. After that, I can get it reviewed by a member of the protobuf team and merged.

jskeet avatar May 20 '24 08:05 jskeet

@mkruskal-google: We've reviewed this from a .NET perspective. Let me know if you'd like any more background. (But basically this allows very efficient access to a repeated field, with caveats around anything that might resize it.)

jskeet avatar May 22 '24 06:05 jskeet

This lgtm, @PascalSenn we need you to sign the CLA though. It also seems to be failing a C# test

mkruskal-google avatar Jun 03 '24 20:06 mkruskal-google

@mkruskal-google CLA Is signed. The failed test is somehow related to the [AllowPartiallyTrustedCallers]. It's a attribute from .NET Framework. I work on a unix machine and cannot run the windows tests. @jskeet / @JamesNK Do i just have to add [AllowPartiallyTrustedCallers] to the UnsafeCollectionOperations aswell?

PascalSenn avatar Jun 04 '24 05:06 PascalSenn

@PascalSenn: It's always been an area I really find hard to understand, to be honest. @JamesNK could you advise, please?

jskeet avatar Jun 04 '24 05:06 jskeet

You need to put [SecuritySafeCritical] attribute on methods that return span. See https://github.com/protocolbuffers/protobuf/issues/7700 as an example

JamesNK avatar Jun 04 '24 05:06 JamesNK

I copied your branch locally and ran net462 tests. Here are the changes you need to make:

https://github.com/JamesNK/protobuf/commit/55dda6293ff82291288d630158499b9fec017d00

JamesNK avatar Jun 04 '24 06:06 JamesNK

Thanks @jskeet & @JamesNK, tests should pass now.

PascalSenn avatar Jun 04 '24 13:06 PascalSenn

Ping @mkruskal-google for approval.

jskeet avatar Aug 07 '24 17:08 jskeet

Hi, I would be very interested by that feature too. Is there any chance to go forward with that PR?

verdie-g avatar Oct 03 '24 23:10 verdie-g

@JamesNK - you're being flagged for not having signed the CLA but having contributed to the last commit. Can you resolve that?

JasonLunn avatar Oct 11 '24 19:10 JasonLunn

I'm pretty sure I already have. When I click on the CLA details link I see this:

image

JamesNK avatar Oct 12 '24 04:10 JamesNK

I'm pretty sure I already have. When I click on the CLA details link I see this:

image

Can you check that the contact info matches j***s​@newtonking.com - that's the email associated with commit https://github.com/protocolbuffers/protobuf/commit/46ed223b2c3aceece6d2128c258bb94bedf8fd26 which is the one in question.

JasonLunn avatar Oct 12 '24 12:10 JasonLunn

Ok, the signed in google account was different. I signed it with a different email address.

JamesNK avatar Oct 14 '24 08:10 JamesNK

@PascalSenn - could you push a rebase? All the test failures are outside of the C# directory, I suspect their all due to drift with test targets given this PR's longevity.

JasonLunn avatar Oct 14 '24 15:10 JasonLunn