protobuf C#: add option and code generation for null reference types

This replaces the abandoned https://github.com/protocolbuffers/protobuf/pull/16240 and should address the open points mentioned.

This will fix #6632 and close #16240

Feb 05 '25 10:02 Falco20019

Since @tonydnewell doesn't seem to have time to continue on #16240, I created this to resolve the two NITs that Jon Skeet had and to cover string and ByteString as mentioned there is most probably something missing (which he was correct with).

I made sure it distinguishes between string (stays string), optional string (stays string but with hazzers) and google.protobuf.StringValue (becomes string?). bytes will stay ByteString as it's ensured to use ByteString.Empty as default and is not allowing to be set to null.

Messages will always be nullable. The only case that comes to my mind that we might want to treat different could be on proto2 with required, but that's not checked as of now anyway. So making it nullable would at least reflect that it should be checked.

I hope this helps to reduce the pain on making progress with this topic a bit. Rest assured I'm well aware that before this gets merged there will still be time needed to compare the old and new outputs to make sure everything behaves as expected. So I have no expectations at all on when this happens and will also not ping this anymore unless feedback is requested on the PR.

I generated all unittests from protoc with and without the option for my own checks. Just adding in case someone else wants to compare for feedback: unittests-no_opts-enable_nrt.zip

Feb 05 '25 12:02 Falco20019

bytes will stay ByteString as it's ensured to use ByteString.Empty as default and is not allowing to be set to null.

Completely agree about regular bytes fields - my comment on #16240 was around BytesValue as the wrapper message for bytes fields (e.g. TestWellKnownTypes.bytes_field). That's what I'd expect to become a ByteString?-typed property, in the same way that a StringValue becomes string?. (The default value of TestWellKnownTypes.bytes_field really is null.)

It's possible you already cover this - at a very quick glance, it looks like you haven't, but I could be wrong. If you don't cover this already, but you agree with my analysis, if you could make that change, that would be great.

Feb 05 '25 13:02 jskeet

Thanks for the clarification and quick response.. I will definetly check and fix it if it's off.

Feb 05 '25 13:02 Falco20019

I fixed the BytesValue, it behaves now as follows:

bytes -> ByteString
google.protobuf.BytesValue -> ByteString?

While checking the generated code, I remembered that in the past I had the assumption that repeated and map fields could have null as value for wrapper types. This is due to i.e. repeated google.protobuf.Int32Value being generated as RepeatedField<int?> instead of RepeatedField<int>. Accordingly, repeated google.protobuf.StringValue would now generate to RepeatedField<string?> as well.

Technically it's correct and RepeatedField mentions that it is not allowing null even though it's containing a nullable type. As you mentioned they are runtime-checked against null on Add, which makes it a bit strange API-wise. It would be possible to change it for NRT generation, but this might force users to check/cast their nullables first when switching to NRT generation. So while they should have checked it in the first place, it would be a breaking change API wise in case the users didn't pass .Value. So I would tend to keep it as is and have them generated RepeatedField<string?> and RepeatedField<ByteString?> for the nullables with null not being allowed.

Feb 05 '25 14:02 Falco20019

I'll need to go back and revisit why we end up with RepeatedField<int?> for repeated Int32Value. It could well just be a mistake, I'm not sure. Likewise I'll see about repeated fields for StringValue. If it was a mistake for Int32Value, I think it would be better not to propagate it for string/bytes. Chances are I'll need to write all of this up in a doc somewhere, otherwise I'll just end up chasing my own tail. Thanks for flagging it up though.

Feb 05 '25 14:02 jskeet

I already found the corresponding location in RepeatedMessageFieldGenerator::GenerateMembers and will either suppress it for only the StringValue and BytesValue, or for all wrapper types.

Will fix it to whatever you declare correct behavior once you find time to figure it out. Only leaving this as a reminder for myself by then ;)

Feb 05 '25 15:02 Falco20019

@JasonLunn I merged back protocolbuffers:main in the hope that this fixes the failing tests :)

Feb 05 '25 16:02 Falco20019

@Falco20019 - can you add some tests that cover the new behavior?

Feb 06 '25 20:02 JasonLunn

I can try to. The tests would need to check if a generated field is marked as nullable or not I assume. I could also adjust the existing tests to check if they behave the same for both variants (which they should). But maybe doing it for a subset would suffice. I will try to think about it over the weekend.

Feb 06 '25 20:02 Falco20019

@JasonLunn Could you please have the tests runnable? Thanks :)

Feb 17 '25 08:02 Falco20019

It seems that messages in oneofs are currently not yet generated as nullable. All others looked fine in my tests so-far. Will continue with testing and trying to create tests for it.

Feb 18 '25 13:02 Falco20019

Thanks, I will have a look at the workflows as the failing tests use .NET 6 SDK which just has no way to test the .NET 8 features. Will try to update them tomorrow at work.

Feb 27 '25 17:02 Falco20019

@JasonLunn You might want to replace Jon with yourself as reviewer until tests are really working and the code becomes reviewable to not trigger him over and over again.

As I sadly have no permissions to run the tests myself, we need to do trial-and-errors here. I updated from main to ensure the C++ test failures should be gone. I adjusted the test_csharp.yml as much as possible to use the .NET 8 SDK.

It's using us-docker.pkg.dev/protobuf-build/containers/test/linux/csharp:7.1.2-3.1.415-6.0.100-d9624f2aa83cba3eaf906f751d75b36aacb9aa82 which is based on .NET 6.0.100 which is sadly not able to use .NET 8 in the csproj. I sadly don't see how I could create a new image here to use. Only saw a reference to kokoro/linux/dockerfile/test/csharp/Dockerfile in the install_dotnet_sdk.ps1 but don't find it in the org. So either I have no permissions or it is gone by today.

I'm also not aware of how BAZEL works and therefore not able to adjust csharp/src/Google.Protobuf.Conformance\BUILD.bazel myself to use .NET 6 + .NET 8 for testing. I saw that line 61 has the net6.0 build, but don't know how to ensure both will be used.

I therefore would ask for help of the team on complete those two missing links.

Feb 28 '25 12:02 Falco20019

@JasonLunn It seems that the tests are always using protocolbuffers/protobuf/.github/workflows/test_csharp.yml@refs/heads/main and therefore it will not be possible to see the change of the workflows here. A separate rule is also ensuring I'm not allowed to adjust the workflow. And adjusting the workflow would break all other builds until this is in.

Cyclic dependency :) The Windows build is the only one that doesn't use us-docker.pkg.dev/protobuf-build and therefore succeeds. It's also the only one that tests ALL test cases (including net462) and therefore needs no adjustment to the pipeline. But even then, the containers are still only including 6.0.100 and not 8.0.100 which would be needed. Still not sure where that is maintained as I also have no access to us-docker.pkg.dev/protobuf-build personally.

Also some tests fail for a segfault in upb/util/def_to_proto_weak_import_test.upb_minitable.h which I don't know and haven't touched.

Feb 28 '25 13:02 Falco20019

To work around the test workflow changes, I manually imported this PR into our internal systems. With minor manual tweaks, it ran cleaner as https://github.com/protocolbuffers/protobuf/pull/20561, but still has a couple of issues, some of which may need to be addressed internally because they'll mean changes to the images used to run CI.

In the vanilla linux test config, the 8.0.100 SDK is not installed. It sees the configuration select it, but it only has 3.1.415 and 6.0.100 available to choose from.
On the linux aarch64 test config, where the 8.0.100 SDK is available, we see a different error: error NETSDK1129: The 'Publish' target is not supported without specifying a target framework. The current project targets multiple frameworks, you must specify one of the following frameworks in order to publish: net462, net6.0, net8.0
I see a lot of errors coming out of an internal-only test that have to do with collisions in the 'Google.Protobuf.TestProtosnamespace. Is it straightforward to put the new test protos in theNrt` directory into a distinct namespace so that they don't clash with their non-NRT equivalents?

Mar 04 '25 23:03 JasonLunn

As the image name suggests, only those two were selected similar to the ps1 script. I assume a new image is needed. Also not sure why the 3.1 SDK is still in there, as I didn't see it being used anymore. The global.json in the root pins the used SDK for everything. Will give it another look if there is anything modifying the SDK used.
I messed this up as I hoped to be able to have it as one publish. I will fix it to have both run as distinct ones.
It should be possible to have different namespaces using an option. I tried to avoid it to keep the changes to the tests simpler. I will have to change the using then on each test to import the correct namespace depending on the target framework. But if it helps your internals, I will look into it.

I have some other topics today on my plate but hope to still find time to adjust the PR late today (latest by tomorrow).

Mar 05 '25 06:03 Falco20019

@JasonLunn I fixed the broken test by adding the framework to the linux aarch64 cases. It is sadly not easily possible to put the Google.Protobuf.Test.TestProtos NRT stuff in a separate namespace, since all of them have different namespaces and it would need some effort to get the partials to work as-well. Also base_namespace can only strip away part of the namespace and not add anything. So there are sadly multiple blockers for that.

Can you give me some more insights on your internal-only test? The csproj is configured correctly to only use the correct CS files given the target framework. I assume your test case is not using a ProjectReference but trying to reuse files from there-in directly? In that case, you should ensure your test it doing something similiar as https://github.com/protocolbuffers/protobuf/blob/6efafc5c0fbc316ba273d09cc44394caae987433/csharp/src/Google.Protobuf.Test.TestProtos/Google.Protobuf.Test.TestProtos.csproj#L26-L37 or to use a ProjectReference if possible.

If you only care about non-NRTs, you can also just use <Compile Remove="**/*.pb.nrt.cs" /> without the other stuff.

Mar 06 '25 08:03 Falco20019

I'm checking on what I can do to with regard to details of the internal test

Mar 12 '25 14:03 JasonLunn

Could you expand the PR description to clearly state what the PR is doing?

As someone who hasn't been following this discussion closely, it's not clear to me what the approach is, or how it differs from the PR that preceded it (https://github.com/protocolbuffers/protobuf/pull/16240).

Here are some specific questions the PR description could answer:

Is the PR adding any new public API(s), or is it merely adding annotations that codify the semantics of existing APIs?
This PR appears to add a new codegen flag enable_nrt. What is the intention for this flag? Who do we expect will be using it? Is it intended to eventually become the default?

May 12 '25 06:05 haberman

@haberman: Nullable Reference Types (NRT) are a feature from C# 8 onwards, where you can annotate that a reference is expected to potentially be null, or should not be null (either accepted as a parameter or returned by a method/property). The compiler is then able to warn if it thinks the code using that annotated code is doing the wrong thing. For example:

public static string ReturnNonNullString() { ... }
public static string? ReturnMaybeNullString() { ... }
...
string x = ReturnNonNullString();
string? y = ReturnMaybeNullString();
Console.WriteLine(x.Length); // No warning
Console.WriteLine(y.Length); // Warning that you might be dereferencing a null value

It doesn't change execution-time semantics at all. (Indeed, the annotations can be "wrong" in that it's possible for a null value to be returned even a method is annotated not to do so, etc.)

Now, back to protobuf: the enable_nrt flag will start using these annotations on the generated code, so that users of that generated code gain more compile-time safety around nullness. It doesn't add any APIs, it just adds an annotation for some parameters and return types in the generated code. (There's the possibility of a separate change - which is a lot of work - to add nullable reference types within the protobuf runtime library itself. This is definitely not that change.)

I would expect users who use NRT within the rest of the code base to possibly specify enable_nrt for existing proto compilation (but they'll then potentially need to change the consuming code, because there will be warnings) and more likely specify it for new proto compilation.

Defaulting: I wouldn't expect the flag to ever become the default within protoc - support for nullable reference types still isn't even the default in the C# compiler, although the flag to enable it is present in the default project template for new projects. The MSBuild target for using protoc could change whether it uses enable_nrt by default - potentially based on whether the rest of the project does.

That's a fairly brief summary - the exact details of NRT are very complex, unfortunately. (We're still working through exactly how to standardize it in the C# specification...) https://learn.microsoft.com/en-us/dotnet/csharp/nullable-references gives more details if you want to delve further though.

May 12 '25 06:05 jskeet

Could you expand the PR description to clearly state what the PR is doing?

As someone who hasn't been following this discussion closely, it's not clear to me what the approach is, or how it differs from the PR that preceded it (#16240).

It doesn't differ from the approach, it only picks up where Tony left off (see https://github.com/protocolbuffers/protobuf/pull/16240#issuecomment-2410725099). As https://github.com/protocolbuffers/protobuf/issues/6632 is already very detailed on what the intention is, I avoided duplicating it. Jon gave a pretty good picture and if you prefer to have some of it in the initial post, I can copy parts of it or link to it.

As by Jon's request, merging is currently not advised until https://github.com/protocolbuffers/protobuf/pull/20242#discussion_r2046863842 is resolved (still waiting for a response here since last month).

May 12 '25 07:05 Falco20019

As there hasn't been any clear preference for the last weeks, I will now continue with using enable_nrt and revert the .NET 8 stuff to the extend that we have the stuff around but that it's not run automatically or being generated by the pipeline.

Jun 11 '25 06:06 Falco20019

@JasonLunn Can you please let the tests run? It should be now according to the requested state. Your internal tests should hopefully also work now again :) If the tests fail, I will make sure to merge main in and resolve any conflicts.

Jun 11 '25 07:06 Falco20019

@JasonLunn Merged in main, can you run the tests again to see if the failing ones work now? Seemed unrealted to my changes for me.

Jun 13 '25 07:06 Falco20019

@JasonLunn I see a lot of well_known_types_staleness_test failings and errors that starts_with was not found. Nothing I did touch. Is this normal?

Jun 16 '25 07:06 Falco20019

Just had another look and saw that some commits after the merge should fix it. Did another merge, hoping that the tests run now.

Jun 16 '25 07:06 Falco20019

@JasonLunn @jskeet Friendly ping. This is waiting for a week now to have the tests run :) If I should trigger a team handle instead, please let me know. Those errors should not have been caused by me but some other changes on the branch, so I'm sorry for the additional work we all have to go through.

Jun 23 '25 06:06 Falco20019

I'm afraid I have zero time to do anything on Protobuf at the moment. I do understand this is frustrating.

Jun 23 '25 06:06 jskeet

@jskeet I only tagged you hoping you might know a team member/handle to trigger instead. I fully understand you don't have time to label the issue over and over again 👍 I will refrain from tagging you and wait for Jason (or any other member) to aid me on this.

Jun 23 '25 06:06 Falco20019

I've just come across this after updating some C# projects to enable NRT and this would be great to enable for generated protobuf code especially for preventing errors when accessing null submessages.

It looks like the C# tests and most of the tests are passing here?

Aug 21 '25 08:08 jlumsden-mts