apisix-ingress-controller icon indicating copy to clipboard operation
apisix-ingress-controller copied to clipboard

Discuss - APISIX Ingress Controller GA

Open juzhiyuan opened this issue 4 months ago • 12 comments

As more open source users on the ASF Slack and APISIX GitHub channels have asked when the GA version will be released. As one of the maintainers, I have discussed with some of the maintainers (also cc @bzp2010 @nic-6443 @moonming).

The current plan is as follows:

  1. Release RC4 before September 1. This version will include some performance optimizations.

  2. Release RC5 before October 1.

  3. Release the GA version in mid-November. Any reported bugs or performance issues will be addressed prior to this release.

Note: This is not the final timeline. If no new issues are reported in the weeks following the release of RC5, the GA version may be released earlier.

We encourage open-source and enterprise users to adopt the APISIX Ingress Controller 2.x version and provide feedback via GitHub Issues or online meetings (leave a message here). This will greatly contribute to the project's development.

I'll post updates on the latest releases and progress through this issue until GA.

juzhiyuan avatar Aug 18 '25 10:08 juzhiyuan

Hello, is this plan still valid? There is not RC5 yet.

ljiljanakozic avatar Oct 09 '25 11:10 ljiljanakozic

Hi @ljiljanakozic, there are some pending tasks need to be done before RC5.

juzhiyuan avatar Oct 09 '25 14:10 juzhiyuan

Hello,

do you plan to include in the GA version the Consumer Group as requested in this issue. It is a real limitation as it is not possible to configure them by API because all items linked to Consumer Group are deleted by the ingress controler. So the only way to use them, is to remove the ingress controller.

ggarson avatar Oct 28 '25 11:10 ggarson

Hi, the RC5 was originally scheduled for release in early October, but it's already late October and hasn't been released yet. Will this delay affect the planned GA timeline? We're currently evaluating when we can deploy APISIX 2.0 into production.

regend avatar Oct 29 '25 03:10 regend

Updates: RC5 has been released Oct 20. Refer to https://github.com/apache/apisix-ingress-controller/commits/v2.0.0/

juzhiyuan avatar Oct 29 '25 13:10 juzhiyuan

Updates: We're working on 2 things:

  1. Waiting for more feedback from users.
  2. Add necessary testcases before GA

We will complete the GA release in the 1st to 2nd week of December.

juzhiyuan avatar Nov 11 '25 01:11 juzhiyuan

Hi @juzhiyuan,

We are evaluating the API-driven standalone mode for use with the APISIX Ingress Controller in our Kubernetes environment. The goal is to simplify our architecture by removing the dependency on etcd.

However, based on the current documentation and our research, we think that we identified several limitations that make it difficult to adopt this mode outside of testing. We would like to ask about the roadmap to General Availability (GA) for this feature and whether the path to GA includes fixes for the following issues:

Key Concerns:

  • Full Configuration Push on Updates: The current implementation requires the Ingress Controller to push the entire configuration for every single change. This creates a significant scalability bottleneck as our route count grows. Will the GA version support incremental updates (e.g., only updating the changed route/upstream) to improve performance?

  • No Persistence & Recovery Downtime: Since APISIX starts with an empty configuration in this mode, any pod restart results in downtime until the Ingress Controller can re-sync. Are there plans to introduce a persistence or caching mechanism (like writing to a local file) that would allow APISIX to recover instantly on its own?

  • Lack of Dashboard Support: The APISIX Dashboard does not work without etcd, which removes a critical tool for observability and troubleshooting. Will there be an alternative way to visualize the running configuration in standalone mode?

  • Will achieving full feature parity with the etcd mode be a requirement for GA?

Thank you for your excellent work on this project!

BR, Johannes

johannes-engler-mw avatar Nov 25 '25 13:11 johannes-engler-mw

Hi @johannes-engler-mw,

Full Configuration Push on Updates

I'll leave it to @bzp2010 for better explanation.

No Persistence & Recovery Downtime

In the Production environment, we should configure /healthz and /liveness correctly to handle traffic after the new Pods are in Ready state.

Lack of Dashboard Support: The APISIX Dashboard does not work without etcd, which removes a critical tool for observability and troubleshooting.

Starting with apache/apisix 3.13, the APISIX Dashboard[1] is built as pure HTML/CSS/JS files and included in the apache/apisix Docker image, and you can access it via apisix:9180/ui.

It only fetches configurations through the APISIX Admin API[2]. The APISIX Dashboard is just a wrapper of the Admin API. I don't see how it relates to Observability because, according to the Dashboard Scopes[3], we don't plan to add such functionalities.

We recommend that users connects with external observability components using the APISIX built-in observability plugins[4] for professional experience.

[1] https://github.com/apache/apisix-dashboard [2] https://docs.api7.ai/apisix/reference/admin-api [3] https://github.com/apache/apisix-dashboard/issues/2981 [4] https://docs.api7.ai/hub#observability

juzhiyuan avatar Nov 26 '25 01:11 juzhiyuan

Thanks @juzhiyuan for the fast response!

Is there an guide how to deploy APISIX on Kubernetes in a production environment, e.g. how many replicas etc.?

Regarding the dashboard: Maybe Observability was the wrong wording, I meant that its helpful to get an quick overview about the configured resources.

Will the dashboard work in the API-Driven Standalone Mode and can it be made available in the cluster?

Waiting for bzp2020s answer on the first topic.

BR, Johannes

johannes-engler-mw avatar Nov 26 '25 11:11 johannes-engler-mw

Hi, @johannes-engler-mw. Apologies for the delay. Its content is so extensive that I must devote considerably more time to it, and even then I cannot guarantee that I have made every point perfectly clear.

I should like to explain the issue you mentioned, "Full Configuration Push on Updates".

First, the conclusion: Full configuration is pushed, but it does not implement as you understand it.

If you have delved into the AIC 2.0 codebase, you will be aware that we employ ADC as the core adapter to support different modes (etcd or stateless), with these modes corresponding respectively to the apisix and apisix-standalone backends [1] within ADC.

Given your interest in AIC's stateless mode, we shall focus primarily on the apisix-standalone backend. On this basis, we will see what optimisations we have made in terms of synchronisation efficiency and APISIX data plane efficiency respectively.

We have implemented numerous optimisation patches to enhance synchronisation speed, including but not limited to: ADC remote configuration caching (exclusively for apisix-standalone), reduction in the frequency of differential checks, and ADC server mode.

  1. ADC remote configuration caching: I have implemented caching within the ADC to consistently minimise network round trips required to fetch configurations from the APISIX Admin API, thereby helping to reduce synchronisation latency.

  2. Reduce the frequency of differential check:

In the standard ADC CLI mode, users invariably attempt to push a series of changes to a single endpoint. This is because APISIX instances utilising etcd possess a centralised storage system. Regardless of which APISIX instance's Admin API we write data to, it is reliably propagated to all data planes. This approach is comparatively efficient. However, APISIX Standalone presents a special case. Here, there is no centralised storage, and configurations must be sent to the Admin API endpoint on each APISIX instance. In CLI mode, we must execute the full "pull-differential check-push" synchronisation process for every instance, which increases linearly with the number of APISIX instances. Our team soon realised this was unsustainable. The unpredictable number of APISIX instances and CPU-intensive differential checks risked slowing down our entire system, which seemed unacceptable. Then, we implemented an optimisation. We ensured that regardless of the number of APISIX instances, differences are calculated only once during a single synchronisation. Subsequently, the patched latest configuration is pushed to each APISIX instance. The time consumed by CPU-intensive tasks would no longer increase linearly.

As I mentioned earlier, I have created a caching layer. This optimisation will work closely with the cache to ensure its effectiveness. It is a systematic optimisation involving every part of the AIC, ADC, and APISIX data plane. I shall endeavour to explain it briefly.

When AIC performs its initial synchronisation, the cache does not yet exist. ADC will attempt to establish the cache by fetching the configuration from any APISIX Admin API. The key modification introduced in APISIX 3.14 [2] will prove significant here: the APISIX Admin API will return a timestamp indicating when it last received a configuration update. ADC will retrieve the latest configuration from this point (this is to minimise performance fluctuations in the data plane, which I shall explain shortly).

For example, if you have several APISIX instances that have been running for some time, the ADC will endeavour to obtain the latest configuration from them. Should there be newly started APISIX instances among them, these will only be considered as backups; configurations already applied in the APISIX instances with the latest configuration will take precedence. Conversely, if all your APISIX instances are brand new, the cache will not be established from this point, and the ADC will perform a full rebuild.

Assuming ADC has successfully established the cache from the latest configuration, it will then perform a differential check between the latest gateway configuration from the AIC-inputted Kubernetes cluster and the cached configuration. This generates a set of differences, after which ADC sequentially applies the differential patches to the cached configuration. The resulting updated configuration is subsequently pushed to the relevant APISIX instances. Subsequently, the latest configuration will replace the previous cached version. Any subsequent changes will always be compared against the cache, eliminating the need to fetch the configuration from APISIX again.

  1. ADC server mode:

ADC is now developed using TypeScript and runs on Node.js. It cannot directly interoperate with the Golang used to develop AIC (via code references). So, we must employ alternative methods for interaction. Initially, ADC was invoked in CLI mode. Each time AIC synchronised, a new process would launch and execute the synchronisation. Subsequently, the ADC process would exit. This was the implementation approach in earlier rc versions.

We observed high CPU utilisation. Upon examining performance consumption (via flame graphs), we discovered that each time Node.js launched to execute ADC logic, it had to re-execute the code loading, parsing, and execution process – a CPU-intensive task. Furthermore, we recognised that to leverage the V8 VM's highly optimised JIT capabilities, we needed to keep ADC running for extended periods whenever possible.

Then, we ultimately decided to build a server mode for ADC. It would listen on a Unix socket, with AIC invoking the API exposed upon it. This ensures code parsing occurs only once during ADC server startup, with subsequent operations benefiting from the V8 JIT's ongoing compilation optimisation process. It is specifically designed for AIC and will only be used in conjunction with AIC. This is the purpose of the adc-server container you now see on rc5.

The major improvements to the ADC have already been discussed above (though I have omitted some of the finer details). Next, I shall address the enhancements made to the APISIX data plane.

  1. How APISIX loads configurations pushed by ADCs?

When a configuration generated by ADC is sent to APISIX, it undergoes fundamental checks, such as verifying whether its format is valid JSON and whether any configuration details are undergoing version rollbacks. Upon passing these checks, APISIX replaces its currently running configuration entirely, applying the changes to each Nginx worker. Subsequently, it rebuilds the routing tree and the LRU cache as required. Thereafter, the updated configuration takes effect at the APISIX data plane.

  1. What optimises have we implemented?

I mentioned routing trees and LRU cache rebuilding earlier; these are virtually the most crucial parts of the optimisation.

APISIX uses a routing system called a radix tree to achieve its efficient URI-based routing. As the name suggests, it is a tree-based data structure. When routes within APISIX change or other configurations that may affect routing are altered, the routing tree is rebuilt. This process is a CPU-intensive task and consumes a significant amount of CPU time.

Basically, this relates to alterations in routes and services. In detail, APISIX employs a dual versioning system to track configuration changes: an overall resource-type-level version and individual resource-level versions (e.g., for routes). In etcd mode, this utilises etcd's modifiedIndex as the version identifier, but standalone mode lacks this capability. Thus, in the traditional file-based standalone mode, updating the configuration causes all version numbers to vanish. This invalidates the route tree and all LRUCaches, potentially causing transient performance fluctuations. Given that configurations in this mode are essentially static, this issue is not significant and appears acceptable. This coarse-grained approach is unsuitable for API-driven standalone configurations, as configuration changes within Kubernetes can occur with considerable frequency. For instance, scaling adjustments to a deployment may alter the Service endpoint, necessitating immediate updates to APISIX's upstream configuration. Such modifications are commonplace in Kubernetes environments, rendering the complete invalidation of the entire cache structure with each change an untenable burden.

Therefore, the ADC uses its caching mechanism to maintain a timestamp-based version numbering system. When a resource undergoes modification, the ADC updates only the configuration version numbers within its cache that pertain to the altered resource. APISIX has likewise been updated to accommodate this pattern, whereby only caches associated with resources within the modified version number range are invalidated. Should the change involve no alterations whatsoever to the routing tree, the routing tree remains entirely intact and is not rebuilt.

Additionally, ADC traditionally always inlines the upstream within the service. Each upstream update triggers a service update, causing the service version number to change. According to certain internal principles of APISIX, when the service changes, the routing tree may also be rebuilt. We have optimised this behaviour: during synchronisation, the ADC separates the upstream from the service and treats it as a distinct upstream resource. Subsequently, the service references this upstream via its upstream ID. Then, changes to upstream nodes (Pod IPs) no longer necessitate a rebuild of the routing tree.

The above explains the significant improvements we have implemented. Judging by the content, your concerns appear to be unfounded. Our team is also conducting performance testing to ensure acceptable performance in common scenarios. Should we encounter any unexpected performance degradation, we will investigate and rectify it. This research is scheduled to conclude prior to the 2.0 General Availability release – indeed, we are currently undertaking this work. Should you encounter specific performance-related issues, please provide the reproducible scenario and your configuration details. We will examine it and, where exists, resolve it.


[1] ADC adopts a concept derived from LLVM, which we divide into frontend, intermediate representation, and backend components. Users input specific YAML or JSON configurations into ADC. Following processing by ADC, the backend initiates a batch of API calls to push the configuration to various backend systems: APISIX, APISIX API-driven standalone. This allows us to achieve modularisation and layering within the project.

[2] It is advisable to utilise the latest APISIX alongside the most recent AIC version to benefit from all new optimisations. APISIX 3.13 does not return timestamps, meaning caching cannot be established, and an initial full rebuild will be unavoidable.

bzp2010 avatar Nov 28 '25 02:11 bzp2010

Thanks @bzp2010 for the detailed explanation, that sounds great.

Waiting for the release now ;).

BR, Johannes

johannes-engler-mw avatar Dec 03 '25 14:12 johannes-engler-mw

fyi: https://lists.apache.org/thread/rd8bjlvfzqsfzhh9j5v27x132lk8jd85

juzhiyuan avatar Dec 10 '25 20:12 juzhiyuan