protobuf Introduce a runtime version "poison pill" for C#

For release notes:

Generated code now keeps track of which version of protoc was used to generate it, and which version of Google.Protobuf is required to ensure compatibility. Every message type checks for compatibility with the currently-executing runtime on initialization.

Once this has been merged, we need to work out an appropriate process to run the regeneration script every time the version changes.

(We could potentially run unit tests for exact runtime equality within Google.Protobuf, if that would be deemed useful.)

Nov 01 '22 10:11 jskeet

Talking with @amanda-tarafa, we may want to reconsider whether this should be included within a minor, or only at the next major. I'm personally okay with it (it will only fail if you're doing the wrong thing) but some care is required. I'd add a "do not merge" label but I can't find one - I'm going to make it a draft PR instead for now.

I'll also consult Microsoft folks to make sure it doesn't surprise them.

Nov 01 '22 11:11 jskeet

I've just realized that this would be a breaking change after all - because users may already have a static constructor of their own.

Will close this PR for now, but leave the branch available for others to review - I expect the general plan to stay the same, but it'll need tweaking and may need to be in a new major version after all.

Nov 01 '22 12:11 jskeet

Reopening with a slightly alternate design:

The static constructor is now in the reflection class. While this is also partial (so it could very-theoretically be a breaking change) I think it's much less likely to be an issue
There is a new internal method in the reflection class which currently just returns true. This is non-void so it can be called by a static field initializer, but it doesn't need to do the actual validation as that's in the static constructor. If we wanted to delay validation in some cases, we could introduce a Lazy<bool> static field in the reflection class and just return lazyValidation.Value from the method.
The validation method is called by a per-message static field initializer and in the parameterless constructor

Expected impact:

Access to the descriptor via the reflection class is now gated by runtime version validation as well (this is a good thing)
An additional static field per message type: very small amount of extra memory required
An additional method call per message type initialization: insignificant
An additional method call per constructor call: probably insignificant, but we should benchmark

Additionally, this has highlighted the need for a clearer compatibility contract in terms of generated message and reflection partial classes. We probably want to document what is supported and what isn't in terms of partial classes.

Nov 02 '22 10:11 jskeet

I haven't seen this situation be a problem myself (higher generated code than Google.Protobuf version). But that's probably because many people who build gRPC apps (not libraries) use generated code directly.

The idea is fine.

Two points:

I don't like validating the version with every constructor call. It adds a per-message performance cost to deserializing. Also, someone creating and using generated POCO objects seems like it is unlikely to ever have a problem with doing POCO things. e.g. creating objects, getting and setting fields, adding items to collections, etc. I'd prefer the validation happens when interacting with Protobuf. That means serializing generated POCOs to and from Protobuf. Hopefully, the static constructor check, which is a one-off cost, would validate the situation.
How often will the minimum required version be updated? Updating it every time seems unnecessary. I'd prefer it was updated when needed. For example, a code generation change that requires a new version of Google.Protobuf is introduced.

Nov 03 '22 04:11 JamesNK

@JamesNK: Thanks for looking at this and for your thoughtful reply. Replying to your individual points:

Whether or not this is an issue worth addressing

This hasn't caused customer problems in .NET, but it has in other languages. It may not have caused a problem in .NET as our runtime doesn't change very often - but it does change, and the error could be quite obscure and hard to understand.

Validating on every constructor call

It's calling a method with a body of return true;. The actual validation is only done once, in the static constructor for the reflection class. I don't expect that to add any measurable cost to even calling new SomeMessage(), let alone actual deserialization. I definitely need to do some benchmarking to confirm that though. I'll post results in this PR when I've got some. (And obviously, if it does make a significant difference, I'll look at other options.)

Timing of validation

My reasoning for validating this as early as possible in code terms is to prompt users as early as possible in development lifecycle terms. Consider the situation where someone updates protoc without updating the runtime dependency, generates, and then writes (and runs) a load of code and unit tests using the POCOs. I'd personally want them to see the error at that point, before they've forgotten that they updated protoc.

Strictness of minimum runtime version

While updating it every time isn't strictly necessary, it makes things much simpler to understand (and implement in a "default to safety" way) in my view. If we can get consumers into the habit of "if you update protoc, update the runtime at the same time" then we won't have a problem. If updating protoc only sometimes causes this error, it's harder to explain and habitualize updating the runtime, in my view. (It's also worth bearing in mind that this is part of a cross-language effort, and I believe the protobuf team is very keen on making the check strict in other languages - which means it would be better to be strict in .NET for consistency. It makes support policies simpler to write and understand when they can be applied across multiple languages.) I do appreciate your point though.

Nov 03 '22 07:11 jskeet

My reasoning for validating this as early as possible in code terms is to prompt users as early as possible in development lifecycle terms.

The cost of updating Google.Protobuf isn't any bigger after they write code.

Throwing when constructing a POCO, even though it's a perfectly valid POCO, just feels wrong and unexpected from a user perspective. I think it violates the principle of least surprise.

While updating it every time isn't strictly necessary, it makes things much simpler to understand (and implement in a "default to safety" way) in my view. If we can get consumers into the habit of "if you update protoc, update the runtime at the same time" then we won't have a problem. If updating protoc only sometimes causes this error, it's harder to explain and habitualize updating the runtime, in my view. (It's also worth bearing in mind that this is part of a cross-language effort, and I believe the protobuf team is very keen on making the check strict in other languages - which means it would be better to be strict in .NET for consistency. It makes support policies simpler to write and understand when they can be applied across multiple languages.) I do appreciate your point though.

When I think of being strict for no good reason, I think of strong name binding on netfx. It puts an unnecessary burden on apps to ensure versions are perfectly lined up.

The goal is to help people avoid falling into a pit of failure (incompatible versions) with minimal cognitive and time overhead. Every time someone sees the incompatible version exception, even though their app is functionally fine, is a failure.

Thinking about binding redirects brings a new question to my mind: what happens if someone is using proto generated code for version 3.0.0 from one NuGet package, and proto generated code for version 4.0.0 from another package. It seems like their app will never be able to function according to this validation:

        if (currentVersion.Major != targetVersion.Major)
        {
            throw new InvalidOperationException(GetErrorMessage("major"));
        }

Nov 05 '22 13:11 JamesNK

My reasoning for validating this as early as possible in code terms is to prompt users as early as possible in development lifecycle terms.

The cost of updating Google.Protobuf isn't any bigger after they write code.

Not in terms of the size of the change - but I believe there's a difference in terms of understanding the context if you get an error immediately after taking one action (suggesting another action) rather than having the two events separated in time.

Throwing when constructing a POCO, even though it's a perfectly valid POCO, just feels wrong and unexpected from a user perspective. I think it violates the principle of least surprise.

I would agree for any "normal" exception that you might expect to catch... but this is closer to a linker error (MissingMethodException etc) showing that your system is basically misconfigured. Thinking about that, I suspect InvalidOperationException is the wrong exception to throw - would it be clearer if we created a ProtobufRuntimeMismatchException for example?

I can ask the Protobuf team how they'd feel about restricting the validation to "first serialization" or "first deserialization" per message, but my personal feeling is that if something needs to be fixed, the earlier you find out about it the better.

When I think of being strict for no good reason, I think of strong name binding on netfx. It puts an unnecessary burden on apps to ensure versions are perfectly lined up.

Except I think it is a good reason - it provides more confidence that you won't run into issues later. And I agree that requiring versions being perfectly lined up would be overly strict. But in this case we're mostly just requiring SemVer (with the slight oddity being around patch number behavior).

The goal is to help people avoid falling into a pit of failure (incompatible versions) with minimal cognitive and time overhead. Every time someone sees the incompatible version exception, even though their app is functionally fine, is a failure.

I look at it the other way: every time someone sees an app fail with something like MissingMethodException in production due to a rarely-used path that happened not to go through integration testing, even though tooling could have predicted the problem, that feels like a bigger failure.

Thinking about binding redirects brings a new question to my mind: what happens if someone is using proto generated code for version 3.0.0 from one NuGet package, and proto generated code for version 4.0.0 from another package. It seems like their app will never be able to function according to this validation:

Correct, and to my mind that's just highlighting a possible failure ahead of time instead of waiting for a failure in production.

This isn't unique to generated code - any time you have a dependency which expects one major version but actually runs against a different major version, that's a dangerous situation. That's the whole point of SemVer major versions: that they indicate earlier code could be broken.

Personally I'd like the dotnet/nuget build system to spot this much earlier across the board, instead of the current strategy of "hope the breaking change in the major version happens not to affect your application" but that's not a battle I'm trying to fight here...

Nov 07 '22 09:11 jskeet

This is a bad rule:

        if (currentVersion.Major != targetVersion.Major)
        {
            throw new InvalidOperationException(GetErrorMessage("major"));
        }

Imagine:

NuGet package Grpc.Reflection has source gen code for 4.x.x.
NuGet package Contoso.Proto has source gen code for 3.x.x but is no longer maintained.

There is no Google.Protobuf version that works with both because of the exception. The user is stuck.

Now, a breaking change might have been introduced when the major version went from 3 -> 4, and that change could lead to something like a MissingMethodException. But the breaking change could have also been for something completely unrelated to generated code. Maybe some APIs on Google.Protobuf.Reflection were removed. In that situation, then source gen remains working.

Here is what you should do:

private const int MinSupportedMajorVersion = 3;

if (currentVersion.Major < targetVersion.Major)
{
    // Always throw in this case because we don't know what has changed in future generated code.
    throw new InvalidOperationException("The generated code targets a newer version of the Google.Protobuf serializer than the app is currently using. Update Google.Protobuf to {XXX} or greater.");
}
if (targetVersion.Major < MinSupportedMajorVersion)
{
    // Throw if we know that the old generated code is no longer supported by the version of Google.Protobuf that the app is using
    throw new InvalidOperationException("The generated code is incompatible with the version of Google.Protobuf serializer than the app is currently using. Update the generated source code.");
}

The version test is happening inside Google.Protobuf. It is simple to specify that Google.Protobuf that is running no longer supports generated code older than a specific version.

Nov 09 '22 08:11 JamesNK

I'll discuss the "changes for generated code" vs "changes in other aspects of the public API" with the Protobuf team - but fundamentally that still would still leave us running with code written for one major version running against a different major version. I believe that's inherently dangerous and we don't do users any favors by trying to use "hope that it works" as a strategy.

If your application has a dependency that is no longer maintained, which itself depends on another dependency at a version that is no longer supported, you're in a bad spot - and I think it's worth highlighting that ASAP.

Nov 09 '22 09:11 jskeet

For reference: https://protobuf.dev/support/cross-version-runtime-guarantee/

Jul 11 '23 17:07 shaod2

Coming back to this, I discussed the separation of changes for generated code vs other aspects of the public API with the protobuf team, and the feedback was that it was too much of a slippery slope. We'd like to proceed with this as it is, basically.

Note that we don't have any plans for a new major version at the moment, so there's no urgency in terms of that side of things. I do think it's beneficial to avoid the "new generated code with older runtime" sooner rather than later though.

(I haven't rebased this PR, btw - there doesn't seem to be much point in that until there's been more discussion.)

Oct 03 '23 13:10 jskeet

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please add a comment.

This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

Jan 02 '24 10:01 github-actions[bot]

Removed the inactive tag as we do still want to do this. @JamesNK any further thoughts? Maybe we should have a meeting to discuss?

Jan 02 '24 10:01 jskeet

My thoughts here - https://github.com/protocolbuffers/protobuf/pull/10862#issuecomment-1308425758 - haven't changed.

but fundamentally that still would still leave us running with code written for one major version running against a different major version. I believe that's inherently dangerous and we don't do users any favors by trying to use "hope that it works" as a strategy.

In my preferred validation check the Protobuf library has validated that the generated code is supported with MinSupportedMajorVersion. It isn't "hope that it works".

Jan 04 '24 01:01 JamesNK

In my preferred validation check the Protobuf library has validated that the generated code is supported with MinSupportedMajorVersion. It isn't "hope that it works".

Acknowledged. There's still the slippery slope aspect though - I can easily see that going to "well we're not using feature X in the protos, so a breaking change in X doesn't cause a problem". This becomes very complicated to reason about, both when implementing the checks and for customers.

A rule of "code targeting major version X of the runtime must be generated with major version X of the tooling" is much simpler to reason about - while certainly being more restrictive, of course.

Given that the protobuf team is not keen on the separation of "generated code breaking changes" vs "only user code breaking changes", how much of a problem do you anticipate in the current PR? Suggestions as to the next steps to resolve this?

Jan 04 '24 08:01 jskeet

Simple is good but it brings along its friend, inflexible. And inflexible + dependencies aren't a good combination. .NET Framework's simple but inflexible binding rules with strong named assemblies is a good example.

The problem is the rule here isn't good. It will inflict pain on Protobuf users that all their dependencies must be upgraded to the right version at the same time. I think the end result is the Protobuf project will just never increase the major version.

Jan 04 '24 08:01 JamesNK

Simple is good but it brings along its friend, inflexible. And inflexible + dependencies aren't a good combination. .NET Framework's simple but inflexible binding rules with strong named assemblies is a good example.

My counterstatement is that flexible is good, but it brings along its friend, brittleness. And brittleness + dependencies isn't a good combination either. The current behavior of the .NET SDK to be default to "one dependency specified version 2.5.0 and another specified 3.2.0 of a common library? I'm sure 3.2.0 will be fine" is a problem IMO.

The problem is the rule here isn't good. It will inflict pain on Protobuf users that all their dependencies must be upgraded to the right version at the same time. I think the end result is the Protobuf project will just never increase the major version.

Increasing the major version will certainly be massively painful, for the protobuf maintainers (a large amount of planning will be necessary), "intermediate library owners" (anyone who publishes a library that depends on protobuf) and end users. And yes, that may mean that Protobuf (for .NET) never increases the major version - the threshold would need to be very high. (It's important to note that it's fine for, say, Java to take a new major version without .NET doing so, or vice versa. Different ecosystems can have different pain thresholds.)

I think the pain is inherent in making breaking changes: if an application depends on two libraries which depend on two different major versions of the same common library, you're inherently in a really dangerous place. You might get away with it, but you might not.

I acknowledge that your suggestion of making the generated code more tolerant (so being able to work against multiple major versions, but with knowledge of its limits) mitigates this to some extent - but I think the statement of "all dependencies must be upgraded to the right version at the same time" would still basically apply. Unless you're depending on a library which is just protos and no hand-written code to use the runtime, there's still the dichotomy of "is it better to hope that the hand-written code targeting the old version is compatible with the new version, or fail early and clearly". In some ways, making the generated code fail even if it doesn't strictly need to is an approach that forces version alignment to make other code safer. That clearly has advantages and disadvantages, but the Protobuf team policy is to favor safety/guarantees over flexibility.

Fundamentally I think you're right that the approach may mean that for .NET we never take a new major version, so never make any breaking changes - but I think that even without the poison pill, that would probably have been the case.

Jan 04 '24 09:01 jskeet

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please add a comment.

This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

Apr 03 '24 10:04 github-actions[bot]

The policy at https://protobuf.dev/support/cross-version-runtime-guarantee/#major has now been updated to include this:

Starting with the 2025Q1 release, the protobuf project will adopt a rolling compatibility window for major versions. Code generated for a major version V (full version: V.x.y) will be supported by protobuf runtimes of version V and V+1.

Protobuf will not support using gencode from version V with runtime >= V+2 and will be using a “poison pill” mechanism to fail with a clear error message when a software assembly attempts to use such a configuration.

I need to update this PR with that change. That would mean code depending on libraries which in turn depend on v3 and v4 has more chance of working, but it's still fundamentally going to be a roll of the dice (as it would depend on whether the handwritten code in the library depending on v3 gets broken by whatever prompts v4).

@JamesNK: If you have any thoughts about that, I'd be really interested in them. This might be enough for me to change my default position from "I intend to retire before we actually take a new major version."

Jun 12 '24 14:06 jskeet

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please add a comment.

This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

Sep 12 '24 10:09 github-actions[bot]

We still want something like this. I have a document that I want to turn into a blog post around versioning which may help...

Sep 12 '24 10:09 jskeet

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please add a comment.

This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

Dec 12 '24 10:12 github-actions[bot]

Marking as non-stale, as we do want to get back to this at some point (but with a "next major but one" probably).

Dec 12 '24 11:12 jskeet

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please add a comment.

This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

Mar 13 '25 10:03 github-actions[bot]

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active or becomes active again, please reopen it.

This PR was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

Mar 28 '25 10:03 github-actions[bot]