reference icon indicating copy to clipboard operation
reference copied to clipboard

gnmi on-change clarification

Open osquitun opened this issue 11 months ago • 9 comments

Provide clarification to On Change for STREAM Subscription in section 3.5.1.5.2 to exclude rapidly changing counter types and only stream updates on the initial subscription and at the heartbeat interval if it has been specified.

osquitun avatar Jan 21 '25 14:01 osquitun

Could you elaborate on the reasoning for this change?

It seems that the scenario you have in mind is already covered by the target defined mode: https://github.com/openconfig/reference/blob/95392c148dfe1143ef99acb2bb11d1ddd051927f/rpc/gnmi/gnmi-specification.md?plain=1#L1634-L1640

I don't agree that this can be considered "a clarification". This is a significant change in the behavior.

with the exception of rapidly changing counter types. Counter type values SHOULD be transmitted upon the initial subscription update and at each heartbeat_interval if one is specified.

What's the definition of a "rapidly changing counter type"? How can it be distinguished from other counter types?

LimeHat avatar Jan 22 '25 01:01 LimeHat

The reasoning is that TARGET_DEFINED is left to the vendor to determine the best type of subscription. Comparison of vendor implementations have shown a difference in view of which type of subscription should be used. The use of ON_CHANGE avoids the ambiguity and vendor implementation for type of subscription.

The current ON_CHANGE description does not indicated the behavior if there is a mix between leaves suitable for event based update and counters. For example an ON_CHANGE subscription to /interfaces/interface/state would include the counter container within the hierarchy. The counter performance leaves are only desired on SAMPLE basis. This is an obvious scenario, but in some cases there is no counter container and some mix of event leaf and counter items are present within the same container.

Some vendors already behave as described while others do not provide ability to subscribe with ON_CHANGE as a result and have create issues with their choice of implementation of type of subscription with TARGET_DEFINED.

Changing in wording to remove rapidly is fine to cover any counter that should be sample based.

osquitun avatar Jan 22 '25 21:01 osquitun

The current ON_CHANGE description does not indicated the behavior if there is a mix between leaves suitable for event based update and counters.

What do you mean by that? IMO, the current definition of on_change is straightforward and unambiguous: there are no cases where an implementation can switch to another mode implicitly.

The counter performance leaves are only desired on SAMPLE basis.

In many cases, yes, This is when you should use SAMPLE or TARGET_DEFINED to subscribe :-)

but in some cases there is no counter container and some mix of event leaf and counter items are present within the same container.

And in that case target_defined works well, if one has a good implementation. If target_defined implementation of a vendor is not sufficient, you can use selective subscriptions to specific leafs with different modes. You can even combine them in a single SubscriptionList.

Changing in wording to remove rapidly is fine to cover any counter that should be sample based.

I disagree with that idea as well. Your proposal eliminates the ability to use a true ON_CHANGE mode for counters. There are legitimate use cases for this functionality; even if it might not apply to your use case.

In my view, you should be looking to add a new subscription mode if you want to have this implemented properly and have a solid argument as to why TARGET_DEFINED is not sufficient. Or you can propose an extension for TARGET_DEFINED mode that will allow you to fine-tune its parameters beyond the current capabilities.

The current proposal is a non-backward compatible change to the definition of the ON_CHANGE, and that's not great.

LimeHat avatar Feb 03 '25 19:02 LimeHat

Reviewed at May 6, 2025 OC Operators meeting.

Some feedbacks:

Operational Goal is something like: “Important operational state changes should be sent as soon as possible, (using ON_CHANGE mode). But fast-changing counters that don't represent an operational state change should only be sent on some operator specified interval (SAMPLE mode).”

Observations: Today the schema does not clearly, programmatically define which state nodes fall into each category and it’s difficult/non-scalable to construct a query which would enumerate all the paths into these categories.

Perhaps a solution might be to propose an annotation to yang nodes which identifies the "less important" counter nodes and/or the more important "operational state nodes" and a gnmi extension which allows filtering a subscription on the annotation?

Loosely related is this proposal for WHERE clause type filtering: https://github.com/openconfig/gnmi/pull/182. This seems much more complex and impactful to implementations. To be effective, there would still need to be some criteria for filtering that it's not clear the proposal could meet without more extensions.

dplore avatar May 06 '25 18:05 dplore

The operational goal stated above can be easily achieved by using TARGET_DEFINED mode with the exception of "operator specified interval" (currently interval will be chosen by vendor, or by a configuration knob in the vendor cfg).

The interval "problem" can be solved as

  • a new extension
  • a change in the definition of sample_interval (to allow using sample_interval together with TARGET_DEFINED mode)
  • a configuration knob for a grpc-server

There's no need to use ON_CHANGE to solve this.

LimeHat avatar May 08 '25 22:05 LimeHat

a change in the definition of sample_interval (to allow using sample_interval together with TARGET_DEFINED mode)

This looks like a clean solution that is both backwards compatible and does not incur changes to the schemas

hellt avatar May 09 '25 06:05 hellt

The reasons why TARGET_DEFINED as a resolution have issues is two fold. First TARGET_DEFINED is not well defined and varies between vendors as a result and not aligned with a common behavior across vendors. Secondly, this assumes a common collection use case and does not allow for a collector to select if they only wish to receive operational state data or only performance data.

osquitun avatar May 12 '25 16:05 osquitun

First TARGET_DEFINED is not well defined and varies between vendors as a result and not aligned with a common behavior across vendors

I'd argue that it is reasonably well defined, but implementation variations are possible (which is not a bad thing necessarily).

I don't see how or why a breaking change in the definition of ON_CHANGE is the answer to that problem (even if we assume it is a problem).
If you think the definition of TARGET_DEFINED is not sufficient, perhaps you should propose changing it instead.

Secondly, this assumes a common collection use case and does not allow for a collector to select if they only wish to receive operational state data or only performance data.

Can you please elaborate on this point? How is this related to ON_CHANGE or TARGET_DEFINED subscription modes?

LimeHat avatar May 12 '25 17:05 LimeHat

In short, what I would propose is to

  • close this PR
  • start with a clear problem statement in /issues, where solutions can be discussed
  • get a consensus on the problem statement in the operator community
  • and then potential solutions and design decisions can be discussed between vendors and operators

This PR started with an incorrect description first and foremost (this is not a clarification, this is a major change of the spec); and then we delved into multiple different use cases/problem which all are somehow supposed to be resolved by making this change and yet there's zero clarity on why this approach was chosen. In addition to that, participants mentioned different problems that potentially needs to be addressed:

  • is there ambiguity that prevents vendors from implementing good TARGET_DEFINED solutions?
  • is there a need to allow custom sampling intervals for TARGET_DEFINED subs?
  • is there a need to allow filtering out subsets of data from a high-level subscriptions?

LimeHat avatar May 12 '25 18:05 LimeHat