fusionauth-issues icon indicating copy to clipboard operation
fusionauth-issues copied to clipboard

Add tracing telemetry to FusionAuth

Open mooreds opened this issue 3 years ago • 12 comments

Add tracing telemetry to FusionAuth

Problem

I want to use tools like DataDog and Honeycomb to monitor my system, of which FusionAuth is a part.

Solution

Add in OpenTelemetry for FusionAuth operations. It's already documented how to integrate and capture default metrics: https://github.com/FusionAuth/fusionauth-site/pull/1293 but we should capture some spans around user logins.

Alternatives/workarounds

A clear and concise description of any alternative solutions or workarounds you've considered.

Additional context

If you have specific additional spans you'd like tracked, please comment.

https://github.com/open-telemetry/opentelemetry-java-instrumentation

https://opentelemetry.lightstep.com/spans/

https://geekflare.com/opentelemetry-introduction/

https://jeremymorrell.dev/blog/minimal-js-tracing/

Related

  • https://github.com/FusionAuth/fusionauth-issues/issues/2741

Community guidelines

All issues filed in this repository must abide by the FusionAuth community guidelines.

How to vote

Please give us a thumbs up or thumbs down as a reaction to help us prioritize this feature. Feel free to comment if you have a particular need or comment on how this feature should work.

mooreds avatar Apr 02 '22 14:04 mooreds

@mooreds The documentation doesn't mention how we'd do it for hosted instances. Can you give any pointers on how to do it for DataDog for a hosted instance?

theogravity avatar Dec 06 '22 20:12 theogravity

The solution that has worked for others is to use the prometheus endpoint:

https://fusionauth.io/docs/v1/tech/tutorials/prometheus

And ingest that into datadog.

Haven't done it myself, but this looks helpful: https://www.datadoghq.com/blog/monitor-prometheus-metrics/

That will give you low level metrics like jvm memory usage. If you want business level metrics (such as number of failed logins) you'll want to use webhooks and ingest those into datadog. No examples there that I can share.

mooreds avatar Dec 06 '22 21:12 mooreds

Thanks, this looks like it could work since we do deploy the datadog agent.

theogravity avatar Dec 06 '22 22:12 theogravity

@bhalsey here's the issue I mentioned.

mooreds avatar Sep 07 '23 17:09 mooreds

FusionAuth has switched to a lighter weight HTTP server backend since the monitor guide was published. java-http does not have out of the box instrumentation from the opentelemetry-javaagent.jar, so we do not get traces of requests made to FusionAuth.

bhalsey avatar Sep 21 '23 17:09 bhalsey

@bhalsey can we handle https://github.com/FusionAuth/fusionauth-issues/issues/2741 as part of this work?

robotdan avatar Jun 15 '24 00:06 robotdan

FusionAuth has switched to a lighter weight HTTP server backend since the monitor guide was published. java-http does not have out of the box instrumentation from the opentelemetry-javaagent.jar, so we do not get traces of requests made to FusionAuth.

We publish a lot of metrics through the Prometheus endpoint around HTTP requests rates, errors, and timings. Is this what you're looking for, or is there something that we want that is not available via the Prometheus metrics?

robotdan avatar Jun 15 '24 00:06 robotdan

I believe this is a different kind of telemetry, which includes spans and traces information. https://opentelemetry.io/docs/concepts/signals/traces/

mooreds avatar Jun 15 '24 01:06 mooreds

I believe this is a different kind of telemetry, which includes spans and traces information. https://opentelemetry.io/docs/concepts/signals/traces/

Correct. The image under https://opentelemetry.io/docs/concepts/observability-primer/#distributed-traces helps illustrate the value of traces. They can help identify the bottleneck in systems.

bhalsey avatar Jun 17 '24 03:06 bhalsey

@bhalsey can we handle #2741 as part of this work?

#2741 is scoped to usage of FusionAuth to help improve the product. This issue concerns OpenTelemetry tracing to help operators of FusionAuth identify bottlenecks and improve its performance.

bhalsey avatar Jun 17 '24 03:06 bhalsey

OTEL instrumentation support would be greatly appreciated, we need this too to trouble shoot performance

dvictory avatar Jul 17 '24 05:07 dvictory

@dvictory thanks for the comment! Please make sure to upvote the issue as well.

Comments are great for adding flavor or specific use cases, but we sort by number of upvotes to gauge community feature input.

mooreds avatar Jul 17 '24 11:07 mooreds

Is it possible to add http requests related metrics to Prometheus endpoint ? eg: p99 response time per path or max response times

harishreddy-m avatar Feb 08 '25 14:02 harishreddy-m

@harishreddy-m thanks for your comment (sorry for the reply delay). I think it'd make more sense to create a new issue with details about your request so this issue doesn't get confusing with too many different types of monitoring requests.

mooreds avatar May 29 '25 17:05 mooreds