antrea icon indicating copy to clipboard operation
antrea copied to clipboard

Support 32bit AS numbers

Open ShutterQuick opened this issue 8 months ago • 7 comments

The 16bit ASN limitation can be problematic when setting up Antrea in environments already heavily reliant on 32bit AS numbers.

EDIT: List of changes no longer correct. See the rest of the conversation.

This MR makes the following changes:

  • Change ASN fields from int32 to uint32. As negative numbers are nonsensical this should be fine as far as I can figure, without breaking anything.
  • In #6914 it is mentioned 32bit ASNs will have to wait until a new apiVersion. I don't know how the roadmap for this looks, but in case it's a while away I added a way to re-map 16bit numbers to 32bit numbers using annotations. While obviously a little nasty, it works just fine and could potentially be included as an undocumented or explicitly unsupported feature until a new schema version arrives in the future.

Example of use:

apiVersion: crd.antrea.io/v1alpha1
kind: BGPPolicy
metadata:
  name: test-policy
  annotations:
    unsupported.bgp.antrea.io/local-asn-override: '4200100011'
    unsupported.bgp.antrea.io/peer-asn-overrides: '{ "10.0.0.1-65535": 4200100001 }'
spec:
  nodeSelector: {}
  localASN: 65535
  listenPort: 179
  advertisements:
    service:
      ipTypes: [ LoadBalancerIP ]
  bgpPeers:
    - address: 10.0.0.1
      asn: 65535
      port: 179
> antctl  get bgppolicy
NAME            ROUTER-ID     LOCAL-ASN  LISTEN-PORT
test-policy 10.0.0.1 4200100011 179

> antctl  get bgppeers
PEER              ASN        STATE
10.0.0.1:179 4200100001 Established

ShutterQuick avatar Apr 09 '25 13:04 ShutterQuick

Yes, @tnqn is correct. We had this discussion in https://github.com/antrea-io/antrea/pull/6914#discussion_r1910989608 and decided that a new API version would be needed. It's cleaner to define a v1alpha2 version of the API than to rely on annotations (for which the deprecation process would not be clear). I assume we will need a conversion webhook. We already have an example in the code base (for the IPPool CRD) that you can follow: see pkg/controller/ipam/convert.go. When converting down from v1alpha2 to v1alpha1, we can return an error status if the ASN doesn't fit in an int32.

antoninbas avatar Apr 09 '25 20:04 antoninbas

If supporting 32-bit AS numbers is an immediate requirement, it would be preferable to introduce a new API version to handle it properly. And introducing a new API version with an updated field type should be a straightforward change.

Definitely more sensible, yeah :)

I assume we will need a conversion webhook. We already have an example in the code base (for the IPPool CRD) that you can follow: see pkg/controller/ipam/convert.go. When converting down from v1alpha2 to v1alpha1, we can return an error status if the ASN doesn't fit in an int32.

Thank you for the pointers! I've done it this way and updated the merge request.

I ran into an issue with returning an error from the convert hook, though. Looks like Kubernetes really, really doesn't like it when a conversion hook returns an error. At least in my testing, it insisted on converting to all versions with serve enabled on create and update. It entered a very aggressive retry-fail-again loop trying to get it done, to no avail. So it looks to me like returning error from the conversion hook is only sensible for transient errors.

For now I instead made it convert 32-bit ASNs to a 32-bit number when going from v1a2 to v1a1. At least this preserves the information and doesn't violate the datatype. The range restriction on the other hand ... Definitely not ideal, but I'm really not sure what would be a good solution here.

Maybe a feature flag to disable BGPPolicy v1alpha1? Could then have an admission hook to

  • Refuse admitting 32-bit ASNs when v1alpha1 bgppolicies are still in use
  • Patch serve to false on v1alpha1 when admitting the first policy with 32-bit ASNs. Urgh. There has to be a more sensible way ...

If any of you have any good ideas for how it should work please let me know I guess.

@tnqn @antoninbas

ShutterQuick avatar Apr 11 '25 14:04 ShutterQuick

Can one of the admins verify this patch?

antrea-bot avatar Apr 18 '25 13:04 antrea-bot

I did some tests and found kube-apiserver does call the conversion webhook regardless of which version I specify in the request and which version is the storage version. If failing a conversion request causes issues to client, could we convert the value that cannot be accommodated by int32 to an invalid value like 0, @antoninbas @ShutterQuick? In this way at least it's guaranteed any requests that previously worked continue to work. If users expect to use int64, they should switch to v1alpha2.

tnqn avatar Jun 25 '25 11:06 tnqn

@tnqn Thank you for having a look at it!

What you describe sounds reasonable to me. I can't think of anything better at least. I've updated the PR and rebased it on main. The converter now replaces ASNs it can't convert with 0.

ShutterQuick avatar Jun 26 '25 13:06 ShutterQuick

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days. You can add a label "lifecycle/frozen" to skip stale checking.

github-actions[bot] avatar Sep 26 '25 00:09 github-actions[bot]

@tnqn I've rebased on main and made the changes you requested.

ShutterQuick avatar Nov 05 '25 10:11 ShutterQuick