Feature-Request: add memory utilization to activator-hpa
Context
Currently we set activator-hpa only based on a cpu target. As the activator is potentially buffering a lot of requests, it can happen that the configured memory is not sufficient, causing the activator to be OOM-killed:
kubectl describe pod -n knative-serving activator-774d4ff4b8-4l5vp | grep Reason
Reason: OOMKilled
Obviously, the configured resources should be adapted depending on the expected workload (small cluster, big cluster, amount of Kservices, traffic payload sizes etc.), but we also could add a memory target to the activator-hpa to automatically scale more instances of activator if memory pressure increases.
Describe the feature
- Add a
memorytarget to activator-hpa - Add some docs around setting resource limits based on workload sizes
- Add some docs around potential issues (like this) and recommendations (like setting TBC to 0).
Related: https://github.com/knative/serving/issues/13583
/area autoscale
@dprotaso, @psschwei, @evankanderson I'd like your opinion on the feature above. WDYT?
We migrated to the autoscaling/v2 APIs last year, so we should be able to scale on multiple metrics now... :+1:
I think that makes sense, though there are also fixed overheads (informer cache) which grow as cluster sizes grow. I'm not quite sure what to do about that, but it affects all our controllers equally.
/triage accepted
I'm in favour but I'd like us to do some precursory work of evaluating GOMEMLIMIT (introduced in go1.19) and revisit our use of GOGC
- https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications
- https://pkg.go.dev/runtime#hdr-Environment_Variables
Secondly, I know we have an e2e test where we roll a set of activators pods but it's very conservative. I don't think it reflects how the HPA would scale things (ie. scaling up/down the replicas by more than 1). It would be interesting to see if we drop requests as the # of activator expands and contracts.
@evankanderson
I think that makes sense, though there are also fixed overheads (informer cache) which grow as cluster sizes grow. I'm not quite sure what to do about that, but it affects all our controllers equally.
It seems to me there are certain things we could do for that.
a) restrict to resources we actually use via a label b) fetch metadata only for stuff that does not require the resource spec
See here and here for the related issue at the cert-manager project. Of course we manage a lot more resources.
a) restrict to resources we actually use via a label
@skonto are you thinking about something like istio does for this?
If you look at the sharding design doc, you'll note that some controllers want to maintain a read cache across all the reconciled resources in the cluster. I think activator may have been one of the cases where you want to track even non-reconciled revisions so that you can handle a request from the routing layer which doesn't follow the sharding scheme.
Going back at the initial choice of having just cpu hpa it seemedback then that activator was not memory intensive. What has changed? :thinking: cc @ReToCode In current performance tests I see a considerable increase in memory so my take is that it depends on the app as we use autoscale-go instead of helloworld-go (and we requests stay around long enough).
Yeah I think that is the general pattern. Activator uses CPU to copy between the two connections. If there is some latency between Activator and backend and/or the payload size is bigger, this directly increases Activator's memory footprint.
I think we could take a look at a pprof profile as well, just to make sure we have no additional issue. But in the end we probably want to have a Memory HPA as well, as having Activator as request buffer this is a valid and realistic use-case.