public icon indicating copy to clipboard operation
public copied to clipboard

Route programming state module

Open dplore opened this issue 3 years ago • 9 comments

Initial proposal to model the state of routes in a network instance to hardware programming.

Note the model only defines leaves for routing programming errors

dplore avatar Dec 24 '21 00:12 dplore

Compatibility Report for commit 58a5688bb7abc36af243cd4d38f7e717453e68f8: ⛔ yanglint@SO 1.10.17

OpenConfigBot avatar Dec 24 '21 00:12 OpenConfigBot

A few initial comments/observations:

  • This module is called out specifically as -state which while true does not follow the design pattern of other modules (even though some others consist mainly of r/o nodes). If this is going to be a module all to itself, it might be best to generalize the nomenclature for any unknown future proofing
  • Should this rather exist as attributes to prefixes in the AFT model vs. it's own separate constructs dedicated to route programming? Sitting within the AFT domain leverages existing structures however targeting from a client requires a filter to a non-key node in what could be an extremely large list (e.g. more expensive operation)
  • In chassis/multi-component based systems - you could have hw programming errors only occur on 1 or a subset of components. Per the current model, this would have to be abstracted in a way that does not provide visibility into precisely where the source of the error occurred.

earies avatar Jan 07 '22 16:01 earies

A few initial comments/observations:

  • This module is called out specifically as -state which while true does not follow the design pattern of other modules (even though some others consist mainly of r/o nodes). If this is going to be a module all to itself, it might be best to generalize the nomenclature for any unknown future proofing
  • Should this rather exist as attributes to prefixes in the AFT model vs. it's own separate constructs dedicated to route programming? Sitting within the AFT domain leverages existing structures however targeting from a client requires a filter to a non-key node in what could be an extremely large list (e.g. more expensive operation)

Having these being in a separate tree seems better. It separates the error reporting state from the AFT per se. As you said having it separate also makes it easier for a client to just subscribe for this error state without subscribing to the entire AFT. ( at least the routes collection within it ie )

  • In chassis/multi-component based systems - you could have hw programming errors only occur on 1 or a subset of components. Per the current model, this would have to be abstracted in a way that does not provide visibility into precisely where the source of the error occurred.

mayukh288 avatar Jan 29 '22 01:01 mayukh288

Having these being in a separate tree seems better. It separates the error reporting state from the AFT per se. As you said having it separate also makes it easier for a client to just subscribe for this error state without subscribing to the entire AFT. ( at least the routes collection within it ie )

As we then start to take the approach of detaching some data from existing trees into new subtrees for specific purposes and organization, I would propose some global structure of classification otherwise we are going to end up with many domains coming in with no proper anchor points that could have potential for different design patterns.

So far - we have no configuration related to this proposal but we don't want to limit such and if we concentrate on the purpose of this domain - it is currently to provide a view/reporting into "anomaly" or "error" state for route programming.

It then might make sense to consider:

  • route-programming to be a r/w container with sub-containers of config/state where we are only concentrating on state today (config wise could very well be the future enablement/disablement and other characteristics of route-programming reporting in general)
  • Possibly nest this underneath a more descriptive domain such as anomalies, errors, etc.. (either as a parent or child to route-programming). If at a parent level then you have a consistent path to target errors or anomalies (much like log messages) then targeting domains - the opposite means you target specific domains for the related error/anomaly data.

earies avatar Jan 30 '22 19:01 earies

Having these being in a separate tree seems better. It separates the error reporting state from the AFT per se. As you said having it separate also makes it easier for a client to just subscribe for this error state without subscribing to the entire AFT. ( at least the routes collection within it ie )

@mayukh288 is there a specific route programming method being used that we want to report errors on? p4rt? gRIBI? Something else?

dplore avatar Feb 01 '22 18:02 dplore

Having these being in a separate tree seems better. It separates the error reporting state from the AFT per se. As you said having it separate also makes it easier for a client to just subscribe for this error state without subscribing to the entire AFT. ( at least the routes collection within it ie )

As we then start to take the approach of detaching some data from existing trees into new subtrees for specific purposes and organization, I would propose some global structure of classification otherwise we are going to end up with many domains coming in with no proper anchor points that could have potential for different design patterns.

So far - we have no configuration related to this proposal but we don't want to limit such and if we concentrate on the purpose of this domain - it is currently to provide a view/reporting into "anomaly" or "error" state for route programming.

It then might make sense to consider:

  • route-programming to be a r/w container with sub-containers of config/state where we are only concentrating on state today (config wise could very well be the future enablement/disablement and other characteristics of route-programming reporting in general)
  • Possibly nest this underneath a more descriptive domain such as anomalies, errors, etc.. (either as a parent or child to route-programming). If at a parent level then you have a consistent path to target errors or anomalies (much like log messages) then targeting domains - the opposite means you target specific domains for the related error/anomaly data.

Giving this some high level structure makes sense, I would think the hierarchy Errors --> Route-Programming would be better than the other way round.

mayukh288 avatar Feb 02 '22 20:02 mayukh288

Having these being in a separate tree seems better. It separates the error reporting state from the AFT per se. As you said having it separate also makes it easier for a client to just subscribe for this error state without subscribing to the entire AFT. ( at least the routes collection within it ie )

@mayukh288 is there a specific route programming method being used that we want to report errors on? p4rt? gRIBI? Something else?

If I understood the question, you were asking whether it makes sense to further sub-divide route-programming into various sub containers ( one for each specific method of programming?). I would think not, regardless of the various mechanism(s) to program the routes, they all end up in one logical FIB, so having just one container to report all route-programming errors should be fine.

mayukh288 avatar Feb 02 '22 20:02 mayukh288

@mayukh288 is there a specific route programming method being used that we want to report errors on? p4rt? gRIBI? Something else?

If I understood the question, you were asking whether it makes sense to further sub-divide route-programming into various sub containers ( one for each specific method of programming?). I would think not, regardless of the various mechanism(s) to program the routes, they all end up in one logical FIB, so having just one container to report all route-programming errors should be fine.

Thanks for this explanation. Openconfig is very protocol and component oriented. To put this request into that context, I think we can state "We want to report telemetry for a single logical FIB (component)".

Today we have AFT representing a logical FIB entity. To make subscribing to these errors efficient, perhaps we can come up with a way for a client subscribe to just the errors from a given AFT instance.

Can we take advantage of a similar pattern we as in the integrated circuit and associated pipeline error counters

dplore avatar Feb 04 '22 00:02 dplore

Thanks for this explanation. Openconfig is very protocol and component oriented. To put this request into that context, I think we can state "We want to report telemetry for a single logical FIB (component)".

Today we have AFT representing a logical FIB entity. To make subscribing to these errors efficient, perhaps we can come up with a way for a client subscribe to just the errors from a given AFT instance.

just to be clear on this discussion and the conclusions here.

the current state of the PR reflects the above conclusion and an operator can subscribe to a specific network-instance in order to constrain the reporting. so i think things are square here.

albeit w/o some of the additional hierarchy that's been discussed in some of the review.

Can we take advantage of a similar pattern we as in the integrated circuit and associated pipeline error counters

wrt the above ... is there still a desire to extend this reporting to the component (ASIC) level? this seems like a significant addition and a deviation from the current model.

sulrich avatar Mar 28 '22 21:03 sulrich

can there be programming errors for other types of routes under afts container, for ex: MPLS/labeled routes, ethernet entries? If so, would it make sense to place these failed routes, drop routes & etc... under the respective aft containers, i.e., ipv4-unicast, ipv6-unicast, mpls & etc...

akalluru1 avatar Sep 02 '22 20:09 akalluru1

can there be programming errors for other types of routes under afts container, for ex: MPLS/labeled routes, ethernet entries? If so, would it make sense to place these failed routes, drop routes & etc... under the respective aft containers, i.e., ipv4-unicast, ipv6-unicast, mpls & etc...

--> Having a separate subtree for the error reporting is better in my opinion. It allows clients to clearly subscribe to this subtree if they are interested. If required we can create containers for the mpls routes also under this subtree

mayukh288 avatar Sep 19 '22 10:09 mayukh288

Superceded by #725

dplore avatar Oct 18 '22 18:10 dplore