Update statsd-exporter setup
In examing our use of the statsd_exporter in airflow, we found a couple issues. We couldn't easily override the fork of the exporter image, becuase it requires pushing statsd.extraMappings. We also found that the exporter produces huge metric names (over 250 chars long) due to metrics that are not matched in the mappings config.
- Always use the ConfigMap. This allows for easier use of the official upstream Prometheus container image.
- Upgrade the statsd-exporter to the latest release.
- Update exporter config to use the new defaults section.
- Add a "drop unknown mappings" to reduce metrics spam.
Signed-off-by: SuperQ [email protected]
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points:
- Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
- In case of a new feature add useful documentation (in docstrings or in
docs/directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it. - Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
- Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
- Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
- Be sure to read the Airflow Coding style. Apache Airflow is a community-driven project and together we are making it better 🚀. In case of doubts contact the developers at: Mailing List: [email protected] Slack: https://s.apache.org/airflow-slack
Thinking about this some more, what if we completely eliminated the airflow image fork and embedded mappings config. We could move the current mappings to values.yaml as statsd.defaultMappings. This would make it even easier to override things in the default setup.
We can still keep the statsd.extraMappings for easy extension of the defaults.
If we can completely get rid of the custom image - I am all for it.
Done!
Just loudly thinking - are the metrics exported now batckwards-compatible with the old ones? @jedcunningham @ephraimbuddy you might want to take a look and see if it also works for Astroonomer (I believe it came from Astronomer's Chart /image originally)
@potiuk Yes, there is no change to the actual mapped metrics. It's just moving the config from embedded in the image to always using a ConfigMap.
It will drop unknown statsd metrics. But this is really necessary because without mappings, it can completely hose a Prometheus instance with thousands of randomly named metrics.
Looking at our metadata endpoint, I see over 18,000 metric names coming from airflow right now. That's over half of our total metric name metadata. Also due to the length of the random names, it's 75% of the data by bytes (over 2MiB of json output into Grafana).
I'm happy to contribute additional mappings to make sure people don't miss out on things exposed over statsd.
And while I'm musing about, long-term it would be helpful if the airflow statsd included tagged data, rather than traditional statsd dotted trees only. This would make it far simpler to convert between monitoring systems.
This is cool. I think though rather than tags, we should focus rather quickly on implementing Open Telemetry https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow
It's already been approved, we just need to implement it :)
If you would like to help I think @howardyoo might make a note that you could test/help in that effort?
Eh, I would prefer to just implement Prometheus/OpenMetrics native metrics. ;-)
I'm not really a fan of OpenTelemetry.
Eh, I would prefer to just implement Prometheus/OpenMetrics native metrics. ;-)
I'm not really a fan of OpenTelemetry.
Why ? Srsly - we have not yet implemented it but any insight from someone whe works on Prometheus might be valuable. BTW. I think you might know Kamil Trzcinski, who is a dear friend of mine :)
So far as I've seen, OTel is great for tracing. But everything else being tacked on is turning it into a bloated kitchen sink of a project. Trying to solve too many problems. And a lot of it is design by committee, trying to make every vendor happy. This makes no vendors happy.
Yes, Kamil is good people.
Thanks for the insights @SuperQ . I think might be good input for @howardyoo :)
BTW. Some of our helm tests need fixing :)
Yes, thanks for the feedback. I believe in general, metrics side of OpenTelemetry has gotten much better (I mean metrics are kind of simple by nature) - so still think the specification has a good potential. We'll just have to see how it goes!
Howard
On Thu, Jul 21, 2022 at 11:05 AM Jarek Potiuk @.***> wrote:
Thanks for the insights @SuperQ https://github.com/SuperQ . I think might be good input for @howardyoo https://github.com/howardyoo :)
— Reply to this email directly, view it on GitHub https://github.com/apache/airflow/pull/25219#issuecomment-1191670866, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZNLLVNOFUQ5R4HSAFBXN3VVFYLLANCNFSM54HZEBTA . You are receiving this because you were mentioned.Message ID: @.***>
Some tests are failing :(
Yea, looks like I added a bug in the configmap. I'll fix it up soon.
Added more test fixes.
Ok, I think I fixed the last of the test issues. I didn't realize the values.yaml schema valdation was that complicated.
Ok, I think I fixed the last of the test issues. I didn't realize the values.yaml schema valdation was that complicated.
Or rather has so many safeguards :)
Oops, fixed one small indenting mistake in the configmap.
Fixed a dupe issue.
Ahh, you're right, I misremembered that the statsd_exporter expects a list of float64, not strings.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.
!stale, just need to find time to get back to this.
I had the same problem When I try to modify some default mapping configuration, if just add a new mapping item, using extraMapping can fully needs, but, if add it conflicts with the default mapping item, custom configuration does not take effect.
Because now default configuration always takes higher priority.
mappings.yml: |-
{{ .Files.Get "dockerfiles/statsd-exporter/mappings.yml" | indent 4 }}
{{ toYaml .Values.statsd.extraMappings | indent 6 }}
{{- end }}
There are only two options for it.
- fork and change the helm chart
- custom build my statsd expoorter image with my mappings file. both bad practice.
I believe this PR can reslove this probelm. But, By reading reviews comment, I realized because this want to add some new metrics mapping, reviewers worry will break backwards compatibility. So, it needs to be more carefully justified.
I'm trying to split it into two steps. First, only provided a optional choose: overideMapping, no change an default behavior, it should be easier and faster to merge. Then, we continue to carefully add metrics in this PR.
So, I create a new PR #26598 .
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.