serving Feature-Request: add memory utilization to activator-hpa

Context

Currently we set activator-hpa only based on a cpu target. As the activator is potentially buffering a lot of requests, it can happen that the configured memory is not sufficient, causing the activator to be OOM-killed:

kubectl describe pod -n knative-serving activator-774d4ff4b8-4l5vp | grep Reason
      Reason:       OOMKilled

Obviously, the configured resources should be adapted depending on the expected workload (small cluster, big cluster, amount of Kservices, traffic payload sizes etc.), but we also could add a memory target to the activator-hpa to automatically scale more instances of activator if memory pressure increases.

Describe the feature

Add a memory target to activator-hpa
Add some docs around setting resource limits based on workload sizes
Add some docs around potential issues (like this) and recommendations (like setting TBC to 0).

Related: https://github.com/knative/serving/issues/13583

/area autoscale

Apr 03 '23 11:04 ReToCode

@dprotaso, @psschwei, @evankanderson I'd like your opinion on the feature above. WDYT?

Apr 03 '23 11:04 ReToCode

We migrated to the autoscaling/v2 APIs last year, so we should be able to scale on multiple metrics now... :+1:

Apr 03 '23 13:04 psschwei

I think that makes sense, though there are also fixed overheads (informer cache) which grow as cluster sizes grow. I'm not quite sure what to do about that, but it affects all our controllers equally.

Apr 03 '23 20:04 evankanderson

/triage accepted

I'm in favour but I'd like us to do some precursory work of evaluating GOMEMLIMIT (introduced in go1.19) and revisit our use of GOGC

https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications
https://pkg.go.dev/runtime#hdr-Environment_Variables

Secondly, I know we have an e2e test where we roll a set of activators pods but it's very conservative. I don't think it reflects how the HPA would scale things (ie. scaling up/down the replicas by more than 1). It would be interesting to see if we drop requests as the # of activator expands and contracts.

Apr 05 '23 18:04 dprotaso

@evankanderson

I think that makes sense, though there are also fixed overheads (informer cache) which grow as cluster sizes grow. I'm not quite sure what to do about that, but it affects all our controllers equally.

It seems to me there are certain things we could do for that.

a) restrict to resources we actually use via a label b) fetch metadata only for stuff that does not require the resource spec

See here and here for the related issue at the cert-manager project. Of course we manage a lot more resources.

Apr 11 '23 15:04 skonto

a) restrict to resources we actually use via a label

@skonto are you thinking about something like istio does for this?

Apr 12 '23 05:04 ReToCode

If you look at the sharding design doc, you'll note that some controllers want to maintain a read cache across all the reconciled resources in the cluster. I think activator may have been one of the cases where you want to track even non-reconciled revisions so that you can handle a request from the routing layer which doesn't follow the sharding scheme.

Apr 12 '23 12:04 evankanderson

Going back at the initial choice of having just cpu hpa it seemedback then that activator was not memory intensive. What has changed? :thinking: cc @ReToCode In current performance tests I see a considerable increase in memory so my take is that it depends on the app as we use autoscale-go instead of helloworld-go (and we requests stay around long enough).

Nov 27 '23 14:11 skonto

Yeah I think that is the general pattern. Activator uses CPU to copy between the two connections. If there is some latency between Activator and backend and/or the payload size is bigger, this directly increases Activator's memory footprint.

I think we could take a look at a pprof profile as well, just to make sure we have no additional issue. But in the end we probably want to have a Memory HPA as well, as having Activator as request buffer this is a valid and realistic use-case.

Nov 28 '23 07:11 ReToCode