opentelemetry-erlang-contrib icon indicating copy to clipboard operation
opentelemetry-erlang-contrib copied to clipboard

Add OpentelemetryPhoenix integration

Open derekkraan opened this issue 3 years ago • 6 comments

Hi all,

We have been running with this configuration for some time now. Thought it would be a worthy addition to the contrib repo.

I made an effort when I was making this initially to make sure we conform to the otel naming conventions.

Looking forward to receiving feedback on this PR.

derekkraan avatar Sep 15 '22 09:09 derekkraan

Is there a path you see to get this added to https://github.com/open-telemetry/opentelemetry-erlang-contrib/tree/main/instrumentation/opentelemetry_phoenix and have it optionally enabled?

bryannaegele avatar Sep 15 '22 16:09 bryannaegele

Yea, why not in OpenTelemetryPhoenix.

And in OtelPhoenix can't it just be the default with an option to disable?

tsloughter avatar Sep 15 '22 17:09 tsloughter

@bryannaegele makes sense, I will rework this to be part of opentelemetry_phoenix

derekkraan avatar Sep 16 '22 12:09 derekkraan

I've updated this PR. There is one small thing :tm:, that is that phoenix doesn't provide a hook for "the user navigated away from the liveview". Until now we were relying on the monitoring solution, but that's (going to be) a separate PR in a separate repo. So for now the result is a "missing parent span".

derekkraan avatar Sep 19 '22 12:09 derekkraan

Something to consider: we published https://hex.pm/packages/opentelemetry_liveview and it's has 30k downloads in about a year. I was planning to move it here but I'm also happy to archive the project and refer to this package.

dvic avatar Sep 19 '22 18:09 dvic

@dvic I would say let's work to get any improvements you've made in your lib into this one, then we have one lib for all of phoenix + liveview.

derekkraan avatar Sep 20 '22 06:09 derekkraan

FYI LV 0.18 just got released and now there are more telemetry events it seems, from the changelog:

Add new LiveView logger with telemetry instrumentation for lifecycle events

dvic avatar Sep 21 '22 05:09 dvic

FYI LV 0.18 just got released and now there are more telemetry events it seems, from the changelog:

Add new LiveView logger with telemetry instrumentation for lifecycle events

No new telemetry events with this release unfortunately, but there is now a built-in console logger powered by the existing events which can be correlated with trace and spans created by this library 🙂

connorlay avatar Sep 21 '22 05:09 connorlay

Ahhh I see

dvic avatar Sep 21 '22 05:09 dvic

I'd rather not have to rely on the monitoring to end the span... Even though it definitely should get added to the SDK, the user shouldn't have to rely on it for regular behaviour and ending a LiveView span is regular behaviour.

Is LiveView not going to add a telemetry event for this?

tsloughter avatar Dec 12 '22 17:12 tsloughter

Oi, based on the LiveView threads from users wanting an unmount callback it sounds like there isn't hope for an event?

tsloughter avatar Dec 12 '22 18:12 tsloughter

So whats the status o this? Is there still not a way around requiring the span being monitored?

Should we merge this in anyway?

tsloughter avatar Jul 08 '23 11:07 tsloughter

There is one way around monitoring the span: don't create the span.

If I remember correctly, there was some discussion somewhere about how having a span open for the liveview itself is not the right way to go :tm:. This is a span that might be open for potentially hours.

Do we have another way to link all spans for a single liveview together, that doesn't require a span?

Other than that, yes, I would like to see this PR get merged sooner rather than later.

derekkraan avatar Jul 10 '23 06:07 derekkraan

Why is the monitor needed for this? If anything it would potentially cause the span to be prematurely closed? Seems worse than having a very long span?

Maybe an option on whether to have a top level span or not? All the spans could be grouped together by pid by the user.

tsloughter avatar Jul 10 '23 10:07 tsloughter

It is necessary for a top level span because liveview isn't emitting an event that corresponds with an "unmount".

If we want to move forward, perhaps recording the pid and dropping the top level span is a good start.

derekkraan avatar Jul 10 '23 10:07 derekkraan

Ok, so the top level span doesn't actually represent the amount of time the liveview is running but how long it is alive before being sweeped?

tsloughter avatar Jul 10 '23 10:07 tsloughter

I'm not sure what you mean by "sweeped". In principle, the top level span will represent the amount of time the liveview's server process is running.

derekkraan avatar Jul 10 '23 10:07 derekkraan

Oh, facepalm, sorry, its early here :). I was thinking of the span sweeper (which already exists and will end spans if they take longer than some amount of time) and not the monitoring way which will end it when the process ends.

tsloughter avatar Jul 10 '23 10:07 tsloughter

So conclusion: remove the parent span and add pid to the attributes?

derekkraan avatar Jul 11 '23 09:07 derekkraan

@derekkraan eh, I was thinking I'd work on getting the monitor merged. Maybe if we can't get that merged this week we go ahead with removing the root and adding the pid?

tsloughter avatar Jul 11 '23 21:07 tsloughter

Sure. I mean I was also wondering if the root span was even a good idea. I still want to kind of have a discussion about that or hear from people who know better.

derekkraan avatar Jul 12 '23 07:07 derekkraan

Ah ok, yea, I'm not the one to know. Maybe we can find others who use it on Slack at least.

tsloughter avatar Jul 12 '23 10:07 tsloughter

It has been an open question for quite some time. We probably want a way to link them because seeing a whole "session" in one go in the o11y tools that provide the timeline view is amazingly useful.

Is it a long running root span, linking span or traces, or putting a "pid" as attribute depends more on the implementation details of otel, what the spec is designed for and what vendors tooling support to offer that view

DianaOlympos avatar Jul 12 '23 10:07 DianaOlympos

One issue with really long-running spans is they could be mistaken for orphaned spans and get caught up in sweepers. Maybe there are workarounds for that but it is an issue.

The fundamental quandary I have is under the hood LiveView is just channels which are websockets which in otel are treated as messaging. In a React SPA or other JS setup, these connections would just be to websockets or AJAX calls. LiveView is just smoke and mirrors over what are standard JS client/server patterns, so I'd argue that looking at how the JS otel community treats this is probably the way to go.

If you look at the socket.io package, they are treating sockets as messaging. opentelemetry-instrumentation-socket.io/dist/src/socket.io.js Something they aren't addressing is correlating the spans (I think, no familiarity with rooms, etc) but the conversations attribute would satisfy linking spans in a session.

seeing a whole "session" in one go in the o11y tools that provide the timeline view is amazingly useful

So I understand the potential value of this from a user behavior view, but is there some other value I'm missing? Is otel the right path for this or is that use case just trying to leverage otel as an analytics platform?

Another question (could be a follow-on) is patching the js to get the full roundtrips.

bryannaegele avatar Jul 12 '23 16:07 bryannaegele

It is useful to me because it is how I can see the sequence of how we got into the state that is problematic. This is because a "transaction" is not just a simple message right?

DianaOlympos avatar Jul 13 '23 07:07 DianaOlympos

In the beginning I really wanted the possibility of a waterfall view for a liveview process, but I think I have changed my mind. One issue is that the waterfall's value goes to 0 when people do things (which they do very often) such as just leaving their browser window open. Suddenly, the interesting part of your waterfall is 1px wide because the root span is 3 days long.

Another issue is that it kinda screws up your reporting. I often like to review performance outliers, but if people have been leaving their tabs open the outliers are full of noise.

Finally, if your span gets reaped by the sweeper, you're left with orphaned spans, which can send you on a wild goose chase trying to figure out why.

So I guess the question I am left with is: would adding pid allow you to find the information you need that you're currently getting from the waterfall?

derekkraan avatar Jul 13 '23 09:07 derekkraan

This is because a "transaction" is not just a simple message right? Are you talking about what comes in for a form tied to Ecto?

bryannaegele avatar Jul 13 '23 15:07 bryannaegele

@bryannaegele no i talk more in the way you would think of a transaction in a distributed system pov with the client/server stuff

DianaOlympos avatar Jul 13 '23 16:07 DianaOlympos