public icon indicating copy to clipboard operation
public copied to clipboard

Programming errors module

Open akalluru1 opened this issue 2 years ago • 2 comments

Initial proposal to model the hardware programming errors in a network instance. Note the model only defines leaves for IP route programming errors. In future it can be extended to other type of routes as well, for ex: MPLS, ethernet & etc...

Change Scope

  • Adding a new module named programming-errors. $ pyang -p release/models -f tree release/models/network-instance/openconfig-programming-errors.yang module: openconfig-programming-errors
  augment /oc-ni:network-instances/oc-ni:network-instance:
    +--rw programming-errors
       +--rw ip-routes
          +--rw config
          |  +--rw enabled?   boolean
          +--ro state
          |  +--ro enabled?   boolean
          +--ro failed-routes
          |  +--ro failed* [prefix]
          |     +--ro prefix    -> ../state/prefix
          |     +--ro state
          |        +--ro prefix?           oc-inet:ip-prefix
          |        +--ro time?             oc-types:timeticks64
          |        +--ro dest-component?   -> /oc-platform:components/component/name
          |        +--ro reason?           enumeration
          +--ro stale-routes
          |  +--ro stale* [prefix]
          |     +--ro prefix    -> ../state/prefix
          |     +--ro state
          |        +--ro prefix?           oc-inet:ip-prefix
          |        +--ro time?             oc-types:timeticks64
          |        +--ro dest-component?   -> /oc-platform:components/component/name
          |        +--ro reason?           enumeration
          +--ro drop-routes
             +--ro drop* [prefix]
                +--ro prefix    -> ../state/prefix
                +--ro state
                   +--ro prefix?           oc-inet:ip-prefix
                   +--ro time?             oc-types:timeticks64
                   +--ro dest-component?   -> /oc-platform:components/component/name
                   +--ro reason?           enumeration
  • This change is backwards compatible.

akalluru1 avatar Oct 11 '22 20:10 akalluru1

Compatibility Report for commit d8ee2acfb4cc3861ba5ffdeaf40169befe651f22: ⛔ yanglint@SO 1.10.17

OpenConfigBot avatar Oct 11 '22 20:10 OpenConfigBot

@sulrich @xw-g @earies @mayukh288 for review.

akalluru1 avatar Oct 11 '22 20:10 akalluru1

We discussed this at the Oct 18th community meeting and that review raised a few areas of concern:

  • There's no discussion of extant or proposed vendor support for this capability. We typically ask for examples of existing support in at least two vendors native implementations, or seek some evidence that two or more vendors could and would support it.
  • Some of the language is ambiguous over whether these are focused on the layer that implements the AFT (whether hardware or software) or the network instance RIB. It ought to be the AFT layer; please consider tightening the language to make this clear everywhere.

--> This is indeed focused on error notifications from the AFT layer and below. We can fix up the language to emphasize that point. Do you have examples handy where the language is confusing ?

  • There is no discussion over the lifecycle of entries in these tables; in particular, with no aging mechanism they could conceivably grow to consume large amounts of memory indefinitely. --> The number of entries in these tables will be proportional to the number of routes in the system at any point, so there won't be unbounded growth. If a route gets removed from the AFT/FIB the telemetry data for it will get deleted.

mayukh288 avatar Oct 20 '22 17:10 mayukh288

We discussed this at the Oct 18th community meeting and that review raised a few areas of concern:

  • There's no discussion of extant or proposed vendor support for this capability. We typically ask for examples of existing support in at least two vendors native implementations, or seek some evidence that two or more vendors could and would support it.
  • Some of the language is ambiguous over whether these are focused on the layer that implements the AFT (whether hardware or software) or the network instance RIB. It ought to be the AFT layer; please consider tightening the language to make this clear everywhere.

--> This is indeed focused on error notifications from the AFT layer and below. We can fix up the language to emphasize that point. Do you have examples handy where the language is confusing ?

I think in general this PR uses varying terminology in a few places for the same general thing; "programming" is used throughout and is taken to mean the act of programming and the programming that has already completed. It's often used without the context of what was being programmed or what it was programmed into. "The programming status of IP routes within the network-instance" is what led me to ponder this, because this expression if read only at-a-glance directs me to believe it was the network instance that was programmed, and it's only later that it mentions the AFT in some of the sub-containers. I admit that just a moments thought reveals the context and perhaps I am being nit-picky, but I think clarity goes a long way.

chrisy avatar Oct 24 '22 18:10 chrisy

We discussed this at the Oct 18th community meeting and that review raised a few areas of concern:

  • There's no discussion of extant or proposed vendor support for this capability. We typically ask for examples of existing support in at least two vendors native implementations, or seek some evidence that two or more vendors could and would support it.
  • Some of the language is ambiguous over whether these are focused on the layer that implements the AFT (whether hardware or software) or the network instance RIB. It ought to be the AFT layer; please consider tightening the language to make this clear everywhere.

--> This is indeed focused on error notifications from the AFT layer and below. We can fix up the language to emphasize that point. Do you have examples handy where the language is confusing ?

I think in general this PR uses varying terminology in a few places for the same general thing; "programming" is used throughout and is taken to mean the act of programming and the programming that has already completed. It's often used without the context of what was being programmed or what it was programmed into. "The programming status of IP routes within the network-instance" is what led me to ponder this, because this expression if read only at-a-glance directs me to believe it was the network instance that was programmed, and it's only later that it mentions the AFT in some of the sub-containers. I admit that just a moments thought reveals the context and perhaps I am being nit-picky, but I think clarity goes a long way.

Updated the language/description to be more clear & precise to provide complete context. Please take a look at latest diff & let us know if something doesn't make sense.

akalluru1 avatar Nov 15 '22 04:11 akalluru1

@bstoll @chrisy sorry for the delay in addressing your comments. Can you please take a look at the latest changes?

akalluru1 avatar Nov 15 '22 04:11 akalluru1

@bstoll @chrisy sorry for the delay in addressing your comments. Can you please take a look at the latest changes?

I am comfortable with the documentation updates.

chrisy avatar Nov 21 '22 15:11 chrisy

I also like the latest proposed changes.

bstoll avatar Nov 28 '22 18:11 bstoll

I also like the latest proposed changes.

would you be able to approve this PR? what are the next steps to merge it up? @bstoll @dplore

akalluru1 avatar Nov 30 '22 00:11 akalluru1

/gcbrun

dplore avatar Dec 01 '22 01:12 dplore

Last call for comments. This will be merged on December 14, 2022.

dplore avatar Dec 01 '22 01:12 dplore

@earies @sulrich @aaronmillisor @aredmon8551 for visibility

dplore avatar Dec 07 '22 02:12 dplore

/gcbrun

dplore avatar Dec 14 '22 22:12 dplore