enhancements Streaming Encoding for LIST Responses

Enhancement Description

One-line enhancement description (can be used as a release note): Streaming Response Encoding
Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/5116-streaming-response-encoding/README.md
Previous discussion: https://github.com/kubernetes/kubernetes/issues/129304, https://github.com/kubernetes/kubernetes/pull/129334
Primary contact (assignee): serathius@
Responsible SIGs: api-machinery
Enhancement target (which target equals to which milestone):
- Beta release target (x.y): 1.33
- Stable release target (x.y): 1.34

Milestones:

Beta

[x] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/5119
[x] Code (k/k) update PR(s):
- [x] https://github.com/kubernetes/kubernetes/issues/130168
- [x] https://github.com/kubernetes/kubernetes/issues/130264
- [x] Implement JSON and Proto list benchmarks
- [x] https://github.com/kubernetes/kubernetes/issues/130216
- [x] https://github.com/kubernetes/kubernetes/pull/130220
- [x] https://github.com/kubernetes/kubernetes/issues/130395
- [x] https://github.com/kubernetes/kubernetes/pull/129334
- [x] https://github.com/kubernetes/kubernetes/pull/129407
[x] Docs (k/website) update(s):
- No planning doc update as feature is not user facing.

Stable
- [x] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/5324
- [ ] Code (k/k) update PR(s):
  - https://github.com/kubernetes/kubernetes/issues/130869
  - https://github.com/kubernetes/kubernetes/issues/131885
  - [optional] https://github.com/kubernetes/kubernetes/issues/130169
- [x] Docs (k/website) update(s): No planned

Jan 31 '25 16:01 serathius

/sig api-machinery

Jan 31 '25 16:01 serathius

I'm glad to see this proposal. We have also implemented similar capabilities in our inner repo and are preparing to push this part to upstream. We have submitted a CFP for the upcoming KubeCon China conference.

In our implementation, we use sync.Pool to efficiently manage memory allocation and cache the serialized results of each item. When the buffer reaches a certain size, we execute a flush operation to parallelize the serialization processing and write to http2.

Additionally, we have added support for gzip compression, which is only enabled when the first batch of cached data reaches 128 * 1024.

For json serialization, we have customized the StreamMarshal method for unstructuredList.

As for protobuf, we generate code through a generator to ensure reverse protobuf marshalling compatibility.

type StreamMarshaller interface {
	// return the object size and the item size slice
	StreamSize() (uint64, []int)

	StreamMarshal(w stream.Writer, itemSize []int) error
}

And it has conducted extensive testing with large datasets and have obtained comparative results. @yulongfang Can you share some benchmark results?

Feb 07 '25 05:02 chenk008

Thank @chenk008 for your introduction. We have many large-scale clusters in Alibaba Cloud. When the controllers of these large-scale clusters are restarted, they will initiate a full list request to the apiserver, which will have a certain impact on the stability of the cluster. We have to use larger machines to run the apiserver, resulting in a waste of resources.

In this context, we adopted the method to carry out relevant optimization and achieved the following results.

list json format return data stress test scenario description:

apiserver version: 1.30
apiserver specification: 32c 128GB
apiserver replica number: 1 replica
number of stock resources: build 10,000 100kb cr information
stress test scenario: increase pressure according to the gradient of qps 0.1 / 0.5

list json format return data related stress test data:

qps 0.05

before optimization: cpu 35.7 c mem 89Gb
stream json after optimization: cpu 6.22 c mem 60 Gb

qps 0.1

before optimization: cpu 11 c mem 146Gb
stream json after optimization: cpu 7.45 c mem 97 Gb

list protobuf Format Returned data Stress test scenario description:

apiserver version: 1.30
apiserver specification: 32c 128GB
apiserver replica number: 1 replica
Number of existing resources: Build 10,000 configmaps information of size 100kb
Stress test scenario: Increase pressure according to the gradient of qps 0.1 / 0.5

list configmaps format Returned data Related stress test data:

qps 0.05

Before optimization: cpu 16.8 c mem 54.3Gb
After stream json optimization: cpu 16.8 c mem 16.1 Gb

qps 0.1

Before optimization: cpu 42 c mem 122Gb
After stream json optimization: cpu 42 c mem 18 Gb

Feb 08 '25 06:02 yulongfang

FYI: Technical details are usually discussed in KEP PRs or elsewhere, with the KEP issue serving as a place to link back work.

@chenk008 @yulongfang you might consider reviewing https://github.com/kubernetes/enhancements/pull/5119

Feb 12 '25 19:02 BenTheElder

Hey @chenk008 @yulongfang please see the previous discussion in https://github.com/kubernetes/kubernetes/issues/129304 and https://github.com/kubernetes/kubernetes/pull/129334. We also have already done a performance analysis of our changes in https://github.com/kubernetes/kubernetes/issues/129304#issuecomment-2565219528.

We also added running a automatic benchmark of list requests. You can see the results in https://perf-dash.k8s.io/#/?jobname=benchmark%20list&metriccategoryname=E2E&metricname=Resources&Resource=memory&PodName=kube-apiserver-benchmark-list-master%2Fkube-apiserver

We currently run it in JSON + configmap with RV="" configuration, hope to expand it to include Proto, Pods, CustomResources and other types of LIST request. Would be awesome if you can contribute.

Feb 13 '25 10:02 serathius

/milestone v1.33

Feb 13 '25 19:02 serathius

@jpbetz @dipesh-rawat this is target to v1.33 and the KEP was merged. Should the lead-opted-in and tracked label be added and tracked by release team?

Feb 14 '25 07:02 pacoxu

@serathius @pacoxu Unfortunately, the enhancement freeze deadline has passed, and this KEP issue was not lead-opted-in, so it wasn’t added to the tracking board for the v1.33 release. Post-freeze, we've disabled the automated sync job for KEP issues to the tracking board.

To move forward, we’ll need a short exception request filed so the team can add the lead-opted-in label and manually include this in the tracking board.

If you still wish to progress this enhancement in v1.33, please file an exception request as soon as possible, within three days. If you have any questions, you can reach out in the #release-enhancements channel on Slack and we'll be happy to help. Thanks!

(cc v1.33 Release Lead @npolshakova)

Feb 14 '25 15:02 dipesh-rawat

Ups, @jpbetz is OOO. @deads2k can you take a look?

Feb 14 '25 17:02 serathius

Sent https://groups.google.com/g/kubernetes-sig-release/c/fDI9FdlClnA

Feb 14 '25 17:02 serathius

@serathius Since the release team has APPROVED the exception request here. This will be considered to be added to the milestone for v1.33 release.

Feb 14 '25 20:02 dipesh-rawat

Hello @serathius 👋, v1.33 Enhancements team here.

This enhancement is targeting stage beta for v1.33 (correct me, if otherwise) /stage beta

Here's where this enhancement currently stands:

[x] KEP readme using the latest template has been merged into the k/enhancements repo.
[x] KEP status is marked as implementable for latest-milestone: v1.33. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
[x] KEP readme has up-to-date graduation criteria
[x] KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here). If your production readiness review is not completed yet, please make sure to fill the production readiness questionnaire in your KEP by the PRR Freeze deadline on Thursday 6th February 2025 so that the PRR team has enough time to review your KEP.

With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. 🚀

Could we please link the KEP README in the issue description.

Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/5116-streaming-response-encoding/README.md

The status of this enhancement is marked as Tracked for enhancements freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

/label tracked/yes

Feb 14 '25 20:02 dipesh-rawat

I've manually added this KEP to the tracking board and marked it as tracked for enhancements freeze🚀

Could one of the sig leads add the lead-opted-in label? @deads2k, would you be able to help with this or point me to someone who can? Thanks!

Feb 14 '25 20:02 dipesh-rawat

Could one of the sig leads add the lead-opted-in label?

@serathius Would you be able to assist with the above request? It would be great to get the label added as work is being done in this v1.33 release.

Feb 17 '25 14:02 dipesh-rawat

I'm not a SIG api-machinery lead, so I don't think I should use it. I can ask nicely on Slack.

Feb 17 '25 14:02 serathius

/label lead-opted-in /milestone v1.33

Feb 17 '25 14:02 deads2k

Hey again @serathius 👋, v1.33 Enhancements team here,

Just checking in as we approach Code Freeze at 02:00 UTC Friday 21st March 2025 / 19:00 PDT Thursday 20th March 2025.

Here's where this enhancement currently stands:

[x] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
[ ] All PRs are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

For this enhancement, it looks like the following PRs need to be merged before code freeze (and we need to update the Issue description to include all the related PRs of this KEP):

~https://github.com/kubernetes/kubernetes/issues/130169~
~Implement LIST benchmarks for different API kinds (Configmap, Pod, CustomResource)~
https://github.com/kubernetes/kubernetes/issues/130216
~https://github.com/kubernetes/kubernetes/issues/130395~
~Implement JSON streaming encoder~
~Implement Proto streaming encoder~
~Reduce memory allocated for control plane in benchmarks~

If you anticipate missing code freeze, you can file an exception request in advance.

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.

The status of this enhancement is marked as At risk for code freeze.

As always, we are here to help if any questions come up. Thanks!

Feb 28 '25 18:02 fykaa

Hi @serathius 👋 -- this is Aakanksha (@aakankshabhende ) from the 1.33 Communications Team!

For the 1.33 release, we are currently in the process of collecting and curating a list of potential feature blogs, and we'd love for you to consider writing one for your enhancement!

As you may be aware, feature blogs are a great way to communicate to users about features which fall into (but not limited to) the following categories:

This introduces some breaking change(s)
This has significant impacts and/or implications to users
...Or this is a long-awaited feature, which would go a long way to cover the journey more in detail 🎉

To opt in to write a feature blog, could you please let us know and open a "Feature Blog placeholder PR" (which can be only a skeleton at first) against the website repository by Wednesday, 5th March, 2025? For more information about writing a blog, please find the blog contribution guidelines 📚

[!Tip] Some timeline to keep in mind:

02:00 UTC Wednesday, 5th March, 2025: Feature blog PR freeze

Monday, 7th April, 2025: Feature blogs ready for review

You can find more in the release document

[!Note] In your placeholder PR, use XX characters for the blog date in the front matter and file name. We will work with you on updating the PR with the publication date once we have a final number of feature blogs for this release.

Mar 01 '25 02:03 aakankshabhende

@aakankshabhende, done https://github.com/kubernetes/website/pull/49985

Mar 03 '25 13:03 serathius

Hi @serathius 👋, v1.33 Enhancements team here,

Just a quick friendly reminder as we approach the code freeze later this week, at 02:00 UTC Friday 21st March 2025 / 19:00 PDT Thursday 20th March 2025.

The current status of this enhancement is marked as At risk for code freeze. There are a few requirements mentioned in the comment https://github.com/kubernetes/enhancements/issues/5116#issuecomment-2691347050 that still need to be completed.

If you anticipate missing code freeze, you can file an exception request in advance. Thank you!

Mar 17 '25 16:03 dipesh-rawat

We are just missing https://github.com/kubernetes/kubernetes/issues/130216, however I don't think I will have time to implement it due to KEP-4988.

I would propose to move it to GA requirement. @liggitt do you think this is acceptable?

Mar 17 '25 16:03 serathius

We are just missing kubernetes/kubernetes#130216, however I don't think I will have time to implement it due to KEP-4988.

I would propose to move it to GA requirement. @liggitt do you think this is acceptable?

Without that check in place, every new API that gets added risks being unable to make use of streaming encoding... I think it's important to get in place early

Mar 17 '25 17:03 liggitt

I see that more as future proofing against API changes. I don't expect we will add API in this release that will not work with streaming encoding, but if that happens we can still validate that manually. Even if we miss some resource we should be good as the most important resources are covered.

Mar 17 '25 22:03 serathius

https://github.com/liggitt/kubernetes/commits/streaming-list-lint/ has the linting I'd expect... the kube-openapi commit goes to https://github.com/kubernetes/kube-openapi/, then bump that dependency and update the exceptions for the one missing item in k/k

Mar 18 '25 00:03 liggitt

It's hard to believe how awesome you are @liggitt! Knowing how busy you are, you still prepared a draft

Mar 18 '25 08:03 serathius

Sent https://github.com/kubernetes/kube-openapi/pull/531

Mar 18 '25 18:03 serathius

And we are done!

One followup is to write the blogpost in https://github.com/kubernetes/website/pull/49985 but that can be done post freeze. Asked @fuweid for collaborate.

Mar 19 '25 09:03 serathius

@serathius Thanks, for the update and confirming that all required changes are merged (here). Now we can mark this as tracked for code freeze. Also, please let us know if anything changes before the freeze or if there are any other PRs in k/k we should track for this KEP to keep the status accurate.

This enhancement is now marked as tracked for code freeze for the v1.33 Code Freeze!

Mar 19 '25 14:03 dipesh-rawat

There is a ongoing discussion whether this KEP should supersede of KEP-3157 as on server side it achieves better results without need for separate api. There are still some other considerations, discussion is under: https://kubernetes.slack.com/archives/C0EG7JC6T/p1743152351539269?thread_ts=1741283769.908819&cid=C0EG7JC6T

Performance comparison after increasing number of informers from 6 to 16 in https://github.com/kubernetes/perf-tests/pull/3242

The watchlist memory usage increased by 30%, from 2GB to 2.6GB

https://perf-dash.k8s.io/#/?jobname=watch-list-on&metriccategoryname=E2E&metricname=L[…]PodName=kube-apiserver-bootstrap-e2e-master%2Fkube-apiserver

On the other hand streaming memory increased by 6%, from 1.74GB to 1.85GB

https://perf-dash.k8s.io/#/?jobname=watch-list-off&metriccategoryname=E2E&metricname=[…]PodName=kube-apiserver-bootstrap-e2e-master%2Fkube-apiserver

Of course this not apple to apple comparison. When looking at request count, WatchList makes 3 times more requests while using 50% less CPU. I expect there might be differences in informers break/restart logic and gzip enablement.

Mar 28 '25 09:03 serathius

Regarding memory usage, I believe there’s room for enhancement in the GetList logic within the cacher:

https://github.com/kubernetes/kubernetes/blob/195803cde570ad1025a78e36cdbef76bddbc4c33/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L771-L774

// Resize the slice appropriately, since we already know that size of result set
listVal.Set(reflect.MakeSlice(listVal.Type(), len(selectedObjects), len(selectedObjects)))

The slice is backed by a struct type, which leads to large contiguous memory allocations—especially when each item is large. For instance, a Pod struct is about 1 KB, while a ConfigMap is around 228 bytes. If 10,000 items are returned, this results in approximately 9.7 MiB and 2.17 MiB allocations respectively. The more items listed, the more memory gets consumed.

Ideally, we could delegate the MakeSlice and item conversion to the encoder. In the case of a streaming encoder, we could encode items one by one, avoiding the need to materialize the entire list in memory. This would reduce large memory allocations. That said, it would be a significant refactor.

In practice, if the watch cache pagination is stable and client-go avoids triggering a full list due to excessive compactions, clients will typically paginate, making full list requests rare. However, if listing all items is unavoidable, reducing the memory footprint of GetList would still be beneficial.

Apr 07 '25 18:04 fuweid