public icon indicating copy to clipboard operation
public copied to clipboard

Should AFT next hop weight have a range restriction?

Open dplore opened this issue 1 year ago • 5 comments

https://github.com/openconfig/public/blob/a2c7a09e619c56782b3f4495a065fd7c34ef1175/release/models/aft/openconfig-aft-common.yang#L713-L721

It seems a zero value should not be allowed. Any comments?

@rszarecki

dplore avatar May 06 '24 21:05 dplore

There are few issues with 0 value:

  1. if all NH has 0-value, then formulas to calculate per-NH share need error handling (divide by 0). I think it is very doable, but not sure if already implemented.
  2. it is redundant to forwarding-viable and all other techniques of drain. At least some of implementation on marked do not allows for 0-value
  3. The 0-value may be interpreted as backup on certain hardware.

We, Google decided recently that we would NOT allow for link-bandwidth = 0 in our designs. In order to avoid mitivendor interop issues.

I also think uint64, hance range 1-2^64, is wrong. It way too large for any practical implementation. uint8 will be probably sufficient. With 2 nh it alows for 1:255 split - 0.4% fidelity. With 4 nh it allows for 1:255:255:255 - 0.13% fidelity

Practically, the SUM(all weights) is scaled down to O(1000) or less buckets in ASIC. with uint64, if 2 hops get 1:2^64 ratio, how it shall be scaled down to hardware 1:1000 (18e12 times too much ) or 0:1000 (infinity times too less) or 1sth NH shall be not programmed in HW at all?


Rafal Szarecki

On Mon, May 6, 2024 at 2:38 PM Darren Loher @.***> wrote:

https://github.com/openconfig/public/blob/a2c7a09e619c56782b3f4495a065fd7c34ef1175/release/models/aft/openconfig-aft-common.yang#L713-L721

It seems a zero value should not be allowed. Any comments?

@rszarecki https://github.com/rszarecki

— Reply to this email directly, view it on GitHub https://github.com/openconfig/public/issues/1106, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALDSOVPRGT3ZDL3ZUBB7METZA7Z53AVCNFSM6AAAAABHJZQZQ6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DCOBSGU2TANQ . You are receiving this because you were mentioned.Message ID: @.***>

rszarecki avatar May 07 '24 16:05 rszarecki

Or make it uint16 and put upper limit of 1000? so worst case fidelity become 0.1%.

Rafal Szarecki

On Tue, May 7, 2024 at 9:27 AM Rafal Szarecki @.***> wrote:

There are few issues with 0 value:

  1. if all NH has 0-value, then formulas to calculate per-NH share need error handling (divide by 0). I think it is very doable, but not sure if already implemented.
  2. it is redundant to forwarding-viable and all other techniques of drain. At least some of implementation on marked do not allows for 0-value
  3. The 0-value may be interpreted as backup on certain hardware.

We, Google decided recently that we would NOT allow for link-bandwidth = 0 in our designs. In order to avoid mitivendor interop issues.

I also think uint64, hance range 1-2^64, is wrong. It way too large for any practical implementation. uint8 will be probably sufficient. With 2 nh it alows for 1:255 split - 0.4% fidelity. With 4 nh it allows for 1:255:255:255 - 0.13% fidelity

Practically, the SUM(all weights) is scaled down to O(1000) or less buckets in ASIC. with uint64, if 2 hops get 1:2^64 ratio, how it shall be scaled down to hardware 1:1000 (18e12 times too much ) or 0:1000 (infinity times too less) or 1sth NH shall be not programmed in HW at all?


Rafal Szarecki

On Mon, May 6, 2024 at 2:38 PM Darren Loher @.***> wrote:

https://github.com/openconfig/public/blob/a2c7a09e619c56782b3f4495a065fd7c34ef1175/release/models/aft/openconfig-aft-common.yang#L713-L721

It seems a zero value should not be allowed. Any comments?

@rszarecki https://github.com/rszarecki

— Reply to this email directly, view it on GitHub https://github.com/openconfig/public/issues/1106, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALDSOVPRGT3ZDL3ZUBB7METZA7Z53AVCNFSM6AAAAABHJZQZQ6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DCOBSGU2TANQ . You are receiving this because you were mentioned.Message ID: @.***>

rszarecki avatar May 07 '24 16:05 rszarecki

So we definitely should not allow a '0' value here because it is ambiguous how that should be handled.

I do agree deciding on an upper range limit that is less than 2^64 is probably a good idea for the reasons you point out. It's a bit arbitrary to pick the exact value of this limit and in practice, implementations will have to convert/scale the limit to the hardware's capability, which will reduce the precision (or fidelity as you mention) of the traffic distribution.

uint16 with range 1..1000 sounds good to me. It would be good to hear comments from others.

dplore avatar May 08 '24 01:05 dplore

Please do not change the datatype here. It's a backwards incompatible change with far reaching consequences across different APIs. I strongly oppose a type change. The ratio can be calculated even with uint64.

w.r.t value = 0, a clarification for this seems reasonable -- but please validate existing gRIBI implementation behaviours before making this change.

robshakir avatar May 08 '24 01:05 robshakir

Ack on reason to keep uint64.

Still 1:2^64 ratio is ….

Perhaps we shall add description that range 1-1000 (or 2k or something like this) must be supported. Higher values are up to server implementation. If client send value too big for given server - return an error.

W.D.Y.T?


Rafal Szarecki

On Tue, May 7, 2024 at 18:25 Rob Shakir @.***> wrote:

Please do not change the datatype here. It's a backwards incompatible change with far reaching consequences across different APIs. I strongly oppose a type change. The ratio can be calculated even with uint64.

w.r.t value = 0, a clarification for this seems reasonable -- but please validate existing gRIBI implementation behaviours before making this change.

— Reply to this email directly, view it on GitHub https://github.com/openconfig/public/issues/1106#issuecomment-2099562089, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALDSOVNJ33S3K5JM3XRIBDLZBF5JTAVCNFSM6AAAAABHJZQZQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGU3DEMBYHE . You are receiving this because you were mentioned.Message ID: @.***>

rszarecki avatar May 08 '24 01:05 rszarecki

This issue is stale because it has been open 180 days with no activity. If you wish to keep this issue active, please remove the stale label or add a comment, otherwise will be closed in 14 days.

github-actions[bot] avatar Nov 04 '24 02:11 github-actions[bot]