paasta Provide sidecar information to gunicorn autoscaled services

trafficstars

Problem

When gunicorn autoscaling is enabled, paasta includes the gunicorn-exporter sidecar. gunicorn services can send statsd metrics to the sidecar.

Currently, a gunicorn service does not know if there is a sidecar and would send and fail to send statsd messages if there is no sidecar.

Solution

POD_SIDECARS lists what sidecars are run alongside the service

Notes

Note: currently limited to services when the gunicorn-exporter sidecar is included to limit the scope of this change

Aug 11 '23 20:08 ymilki

Would this be used in order to disable metrics if the sidecar doesn't exist? If so I don't really think we want to go that direction, although I could see how knowing which sidecars are attached would be useful to a service more generally...

@jvperrin Yes, this applies to both local deployment (pgctl, yelp-compose, paasta local-run) - where there is no sidecar - as well as for deployments with/without autoscaling enabled, canary or otherwise.

I intend this to be a short-term solution as we continue to examine possible changes to autoscaling and the metrics pipeline for services.

Aug 14 '23 18:08 ymilki

The difference between uwsgi metrics and gunicorn metrics is pull versus push. uwsgi metrics can be served directly by uwsgi whereas gunicorn must push metrics. In order to push metrics, gunicorn must know where to push them.

Aug 14 '23 18:08 ymilki

The difference between uwsgi metrics and gunicorn metrics is pull versus push. uwsgi metrics can be served directly by uwsgi whereas gunicorn must push metrics. In order to push metrics, gunicorn must know where to push them.

Yeah my thought on that is that we should set up somewhere to push them, since it sounds like we definitely want these metrics (not just for worker utilization but also for details about requests per second and logging, from https://docs.gunicorn.org/en/latest/instrumentation.html).

I'm still leaning towards that somewhere being a centralized statsd_exporter, although I do recognize the danger in that since if it's unavailable or something then that would break autoscaling (but the same is true for the sidecar version as it's running right now, it just wouldn't break as many services all at once). I also don't think we'd have to do any hacky rewriting stuff with the metric name like adding a custom prefix per service that gets converted later to a label or anything, since both gunicorn and the statsd_exporter support dogstatsd-style tags that get converted to Prometheus labels.

Aug 15 '23 02:08 jvperrin

An alternative workaround would be to keep the status quo with statsd always sending metrics and strongly suggesting autoscaling should be enabled. This would be compatible with moving the statsd-exporter.

Aug 15 '23 18:08 ymilki

An alternative workaround would be to keep the status quo with statsd always sending metrics and strongly suggesting autoscaling should be enabled. This would be compatible with moving the statsd-exporter.

Hmmm, that's true, although I think there will be quite a few cases in which autoscaling isn't enabled or desired. For instance, in dev/stage it's often disabled due to a low number of instances (should we enable it anyways and just have scaling for 1-2 instances?), for canary instances in prod as there's only 1 or a small fixed number, and some use CPU-based autoscaling instead of worker-based autoscaling (that last case in particular probably should be changed).

Aug 15 '23 22:08 jvperrin

@ymilki @jvperrin did we work out an approach here? The gunicorn work looks paused? I'll close for now but please re-open if we need to

Feb 21 '24 15:02 mattmb

paasta paasta copied to clipboard

Provide sidecar information to gunicorn autoscaled services

Problem

Solution

Notes

paasta
paasta copied to clipboard