Support 32bit AS numbers
The 16bit ASN limitation can be problematic when setting up Antrea in environments already heavily reliant on 32bit AS numbers.
EDIT: List of changes no longer correct. See the rest of the conversation.
This MR makes the following changes:
- Change ASN fields from int32 to uint32. As negative numbers are nonsensical this should be fine as far as I can figure, without breaking anything.
- In #6914 it is mentioned 32bit ASNs will have to wait until a new apiVersion. I don't know how the roadmap for this looks, but in case it's a while away I added a way to re-map 16bit numbers to 32bit numbers using annotations. While obviously a little nasty, it works just fine and could potentially be included as an undocumented or explicitly unsupported feature until a new schema version arrives in the future.
Example of use:
apiVersion: crd.antrea.io/v1alpha1
kind: BGPPolicy
metadata:
name: test-policy
annotations:
unsupported.bgp.antrea.io/local-asn-override: '4200100011'
unsupported.bgp.antrea.io/peer-asn-overrides: '{ "10.0.0.1-65535": 4200100001 }'
spec:
nodeSelector: {}
localASN: 65535
listenPort: 179
advertisements:
service:
ipTypes: [ LoadBalancerIP ]
bgpPeers:
- address: 10.0.0.1
asn: 65535
port: 179
> antctl get bgppolicy
NAME ROUTER-ID LOCAL-ASN LISTEN-PORT
test-policy 10.0.0.1 4200100011 179
> antctl get bgppeers
PEER ASN STATE
10.0.0.1:179 4200100001 Established
Yes, @tnqn is correct. We had this discussion in https://github.com/antrea-io/antrea/pull/6914#discussion_r1910989608 and decided that a new API version would be needed. It's cleaner to define a v1alpha2 version of the API than to rely on annotations (for which the deprecation process would not be clear). I assume we will need a conversion webhook. We already have an example in the code base (for the IPPool CRD) that you can follow: see pkg/controller/ipam/convert.go. When converting down from v1alpha2 to v1alpha1, we can return an error status if the ASN doesn't fit in an int32.
If supporting 32-bit AS numbers is an immediate requirement, it would be preferable to introduce a new API version to handle it properly. And introducing a new API version with an updated field type should be a straightforward change.
Definitely more sensible, yeah :)
I assume we will need a conversion webhook. We already have an example in the code base (for the IPPool CRD) that you can follow: see
pkg/controller/ipam/convert.go. When converting down fromv1alpha2tov1alpha1, we can return an error status if the ASN doesn't fit in anint32.
Thank you for the pointers! I've done it this way and updated the merge request.
I ran into an issue with returning an error from the convert hook, though. Looks like Kubernetes really, really doesn't like it when a conversion hook returns an error. At least in my testing, it insisted on converting to all versions with serve enabled on create and update. It entered a very aggressive retry-fail-again loop trying to get it done, to no avail. So it looks to me like returning error from the conversion hook is only sensible for transient errors.
For now I instead made it convert 32-bit ASNs to a 32-bit number when going from v1a2 to v1a1. At least this preserves the information and doesn't violate the datatype. The range restriction on the other hand ... Definitely not ideal, but I'm really not sure what would be a good solution here.
Maybe a feature flag to disable BGPPolicy v1alpha1? Could then have an admission hook to
- Refuse admitting 32-bit ASNs when v1alpha1 bgppolicies are still in use
- Patch serve to false on v1alpha1 when admitting the first policy with 32-bit ASNs. Urgh. There has to be a more sensible way ...
If any of you have any good ideas for how it should work please let me know I guess.
@tnqn @antoninbas
Can one of the admins verify this patch?
I did some tests and found kube-apiserver does call the conversion webhook regardless of which version I specify in the request and which version is the storage version. If failing a conversion request causes issues to client, could we convert the value that cannot be accommodated by int32 to an invalid value like 0, @antoninbas @ShutterQuick? In this way at least it's guaranteed any requests that previously worked continue to work. If users expect to use int64, they should switch to v1alpha2.
@tnqn Thank you for having a look at it!
What you describe sounds reasonable to me. I can't think of anything better at least. I've updated the PR and rebased it on main. The converter now replaces ASNs it can't convert with 0.
This PR is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days. You can add a label "lifecycle/frozen" to skip stale checking.
@tnqn I've rebased on main and made the changes you requested.