opentelemetry-proto icon indicating copy to clipboard operation
opentelemetry-proto copied to clipboard

Decide Conditions to Declare OTLP 1.0

Open tigrannajaryan opened this issue 3 years ago • 18 comments

Data models and binary Protobuf for all 3 signals (traces,metrics,logs) is now declared Stable in OTLP.

We need to decide what else is remaining to declare OTLP as version 1.0.

We had a discussion in the Spec SIG meeting, here is the summary:

  1. OTLP/JSON. We believe OTLP/JSON must be also declared Stable before OTLP is marked 1.0.0. This work is happening here https://github.com/open-telemetry/opentelemetry-specification/issues/1957

  2. Symbolic name stability in the proto files. This has a few components:

  • (a) Field name stability. I believe these must be declared stable and be part of our stability guarantees. The reason is that these names are visible on the wire for OTLP/JSON when using the default Protobuf/JSON implementation (while it is possible to decouple JSON wire and symbolic names in proto files using custom implementations, we do not think we want to go this route).
  • (b) Enum value names for enums which are field types (e.g. SPAN_KIND_CLIENT). These are also visible on the wire for OTLP/JSON and I believe these must be declared stable.
  • (c) Enum names (e.g. SpanKind). These are not visible on the wire anywhere (is this correct?).
  • (d) Helper enum names and enum value names (e.g. LogRecordFlags). These are not visible on the wire anywhere.
  • (e) The location of messages and enums, i.e. whether they are declared at the top lexical scope or nested inside another message.
  • (f) Message names, package names and directory structure.
  • Note that (c),(d),(e),(f) have no impact on the wire format. However, they impact the generated code. There is currently no agreement whether these should be part of our stability guarantees.
  1. Generated code stability, assuming no changes in proto files. This is dependent on the generator and we are NOT going to provide guarantees about generator behavior.

I believe 1, 2a and 2b are the absolute minimum before declaring OTLP 1.0.

We need to decide what we do about 2c-f. This decision needs to be made before declaring OTLP 1.0.

tigrannajaryan avatar Jun 07 '22 16:06 tigrannajaryan

@bogdandrutu @Aneurysm9 @dyladan @MrAlias @carlosalberto @ocelotl please see above and correct me if I missed anything.

tigrannajaryan avatar Jun 07 '22 16:06 tigrannajaryan

3. Generated code stability, assuming no changes in proto files. This is dependent on the generator and we are NOT going to provide guarantees about generator behavior.

I agree that this is not a guarantee we can make as it would be outside our control. What I do think we should guarantee is the inverse: that we will not change the .proto files in ways that break stability of generated code assuming no change in code generator. In other words, if generated code changes in a breaking way it will be because of the generator changing and not the protocol. This much should be within our control.

Aneurysm9 avatar Jun 07 '22 22:06 Aneurysm9

We talked about this in the spec meeting last week but I'll record my thoughts here for the record. I broadly agree with @tigrannajaryan's summary.

  1. Generated code stability, assuming no changes in proto files. This is dependent on the generator and we are NOT going to provide guarantees about generator behavior.

I agree that this is not a guarantee we can make as it would be outside our control. What I do think we should guarantee is the inverse: that we will not change the .proto files in ways that break stability of generated code assuming no change in code generator. In other words, if generated code changes in a breaking way it will be because of the generator changing and not the protocol. This much should be within our control.

I agree with this in principle, but I think it would be better for us to identify which types of changes that may affect the code generation are and document them as specifically stable. Otherwise it may be difficult for someone outside the core maintainer group to understand why something was accepted or rejected.

dyladan avatar Jun 13 '22 16:06 dyladan

I think that if we introduce breaking changes, we should increase the major version number, only that. Introducing a breaking change may be inconvenient, but we should at least let the user know.

ocelotl avatar Jun 13 '22 19:06 ocelotl

For 2b, we can make the case to use only the "numbers/identifiers" and to not support the "string". I think currently only JS uses it, so if that is the case I would propose to see if that makes sense.

I would propose to remove the "helper" enums in 2d. The reason is they are 2 (logs, metrics) of them already and if I am not mistaken they are inconsistent already.

I would also want to make sure we document the guarantees for the generated code:

  • You commented something about protobuf generated code not being in our power.
  • Do we want to give this guarantee at all? Should we say that generated code MUST not be exposed publicly and if you are consuming this, you must generate the code and consume that internally?

bogdandrutu avatar Jun 14 '22 15:06 bogdandrutu

If this helps move things forward, I had a draft PR adding a linter to detect wire (binary and json) breaking changes to this repo. I am happy to revive it.

It uses Buf breaking change detection which can also detect generated source code breaking changes. Here are all the rules it supports: https://docs.buf.build/breaking/rules

aabmass avatar Jun 15 '22 15:06 aabmass

2c + 2f -> This actually has wire level impact if we allow our messages to be written inside proto.Any (where a type_url + bytes field get written).

I think we exclude that case and go with (Tigran's original suggestion) of 1, 2a and 2b for OTLP stability.

When it comes to generated code, enforcing that ALL ways of generating code remains stable is definitely out-of-scope. We should allow clients to optimise/tweak on their own release schedules independently of the protocol itself.

E.g.

java_otlp_v1_client : 1.2.x makes sense. It speaks OTLP v1, but you can evolve the actual generated code as needed.

In many languages there's more than one tool to generate clients for proto as well.

While I think we should aim to provide stable client libraries, I don't think that's part of the discussion around stable OTLP.

jsuereth avatar Jun 15 '22 16:06 jsuereth

If this helps move things forward, I had a draft PR adding a linter to detect wire (binary and json) breaking changes to this repo. I am happy to revive it.

It uses Buf breaking change detection which can also detect generated source code breaking changes. Here are all the rules it supports: https://docs.buf.build/breaking/rules

This is excellent and could really help take some of the guesswork and bikeshedding out of this. I would propose that we require WIRE_JSON compatibility and scan for FILE and PACKAGE compatibility, having a high bar to justify changes that would break generated code based on those rulesets. I think it is important that we recognize that there will be many consumers of this protocol and breaking changes, even if they are "only" to the generated code, can have a high cost both in terms of the effort required of our consumers and our reputation with them.

Aneurysm9 avatar Jun 20 '22 21:06 Aneurysm9

If this helps move things forward, I had a draft PR adding a linter to detect wire (binary and json) breaking changes to this repo. I am happy to revive it.

It uses Buf breaking change detection which can also detect generated source code breaking changes. Here are all the rules it supports: https://docs.buf.build/breaking/rules

@aabmass This is definitely very useful and can help prevent accidental mistakes when modifying the proto.

tigrannajaryan avatar Jun 21 '22 13:06 tigrannajaryan

The following PR clarifies the current state of stability guarantees: https://github.com/open-telemetry/opentelemetry-proto/pull/406.

When this issue is resolved I will adjust the stability definition, until then I want it to be there to make sure people don't rely on guarantees that don't yet exist.

tigrannajaryan avatar Jun 21 '22 13:06 tigrannajaryan

For 2b, we can make the case to use only the "numbers/identifiers" and to not support the "string". I think currently only JS uses it, so if that is the case I would propose to see if that makes sense.

I would rather support string stability for enum values. In my mind the value of being able to change the strings later is outweighed by breaking user expectations that string enums can be used. Most if not all receivers of OTLP JSON do and will support string value enums because that is the default for the proto3. If a user is emitting JSON which uses those string values and the proto has a minor update they will be confused why their wire compatibility is broken.

dyladan avatar Jun 21 '22 13:06 dyladan

For 2b, we can make the case to use only the "numbers/identifiers" and to not support the "string". I think currently only JS uses it, so if that is the case I would propose to see if that makes sense.

I would rather support string stability for enum values. In my mind the value of being able to change the strings later is outweighed by breaking user expectations that string enums can be used. Most if not all receivers of OTLP JSON do and will support string value enums because that is the default for the proto3. If a user is emitting JSON which uses those string values and the proto has a minor update they will be confused why their wire compatibility is broken.

I agree. Another argument in favour of that: JSON is often used for its human readability and using symbolic names instead of numbers helps with readability.

tigrannajaryan avatar Jun 21 '22 14:06 tigrannajaryan

Having spent some time thinking about this I believe we should do this:

  • OTLP 1.0 should provide stronger stability guarantees than currently, including providing guarantees about symbolic names of messages, fields, enums and packages even if such symbolic names don't appear on the wire. All items in (1) and (2) I listed above should be part of our stability guarantee. This is a stronger guarantee than is strictly necessary just for wire compatibility, but I believe it will help to have happier developers who implement OTLP and as a result increase OTLP adoption in the industry.
  • OTLP 1.0 will NOT provide guarantees about code generator behaviors, but we will not knowingly make changes to proto files that will result in breaking changes in the generated code.

Before OTLP 1.0 is declared we must do the last batch of changes, with the following open PRs either accepted or rejected and issues resolved:

  • https://github.com/open-telemetry/opentelemetry-proto/pull/407
  • https://github.com/open-telemetry/opentelemetry-proto/pull/405
  • https://github.com/open-telemetry/opentelemetry-proto/pull/397
  • https://github.com/open-telemetry/opentelemetry-specification/issues/1957

While resolving these open PRs/issues, before OTLP 1.0 is declared we MAY make breaking changes that are allowed by the current stability guarantees (cc @Aneurysm9 since you have a strong opinion about this particular clause).

tigrannajaryan avatar Jun 21 '22 14:06 tigrannajaryan

While resolving these open PRs/issues, before OTLP 1.0 is declared we MAY make breaking changes that are allowed by the current stability guarantees (cc @Aneurysm9 since you have a strong opinion about this particular clause).

I support this approach. I think that there should still be a high bar for breaking changes at this point - we should not change things simply because we can, but because the changes are necessary - but now is the time to make these changes if they're needed. If none are needed then it is time to declare stability.

Aneurysm9 avatar Jun 21 '22 15:06 Aneurysm9

@bogdandrutu we discussed this in the Spec SIG and the people present agreed with https://github.com/open-telemetry/opentelemetry-proto/issues/400#issuecomment-1161848977 Please post what you think about it.

tigrannajaryan avatar Jun 21 '22 19:06 tigrannajaryan

@bogdandrutu Any opinion on https://github.com/open-telemetry/opentelemetry-proto/issues/400#issuecomment-1161848977 ?

carlosalberto avatar Jul 08 '22 11:07 carlosalberto

@tigrannajaryan I would add #412 to the list of issues to be resolved before JSON stability

dyladan avatar Jul 08 '22 18:07 dyladan

@tigrannajaryan I would add #412 to the list of issues to be resolved before JSON stability

@dyladan Yes, I added it to the list here https://github.com/open-telemetry/opentelemetry-specification/issues/1957

tigrannajaryan avatar Jul 08 '22 18:07 tigrannajaryan