opentelemetry-haskell icon indicating copy to clipboard operation
opentelemetry-haskell copied to clipboard

Instrument some real world projects

Open ethercrow opened this issue 5 years ago • 14 comments

Would be nice to find and fix pain points in the process and build a portfolio of instrumentation examples to point users to.

@mpickering do you have some project in mind?

@mpardalos you might also be interested.

ethercrow avatar Apr 19 '20 19:04 ethercrow

@ethercrow I can try on GHC if you tell me how to emit the right traces. If it doesn't involve a library that would be better or I should be able to copy the library into the GHC tree somehow.

The other project it would be useful on would be ghcide but I'll leave that for now..

mpickering avatar Apr 19 '20 22:04 mpickering

@mpickering

I think these two files should be enough for GHC

https://github.com/ethercrow/opentelemetry-haskell/blob/master/opentelemetry/src/OpenTelemetry/Eventlog.hs https://github.com/ethercrow/opentelemetry-haskell/blob/master/opentelemetry/src/OpenTelemetry/SpanContext.hs

but you'd have to replace the Control.Monad.Catch.generalBracket with the GHC-specific one.

After you have an eventlog, you can use eventlog-to-zipkin or eventlog-to-lightstep to export it somewhere, the README has some examples.

Maybe hie-bios is a nicer first target since it can depend on exceptions?

ethercrow avatar Apr 19 '20 22:04 ethercrow

I instrumented GHC and it seems there is something broken in the zipkin exporter because each event seems to end up in its own trace.

Also problems with

  • Eventlog parsing logic assumes threaded runtime (Just tid in UserMessage case)
  • Zipkin returned 202 response code rather than 200

I will email you the eventlog.

image

mpickering avatar Apr 20 '20 09:04 mpickering

"202 Accepted" is fine, I'll fix the status code check.

Yeah, I haven't thought of non-threaded runtimes yet, I'll take a look, thanks for the eventlog!

ethercrow avatar Apr 20 '20 10:04 ethercrow

@mpickering with this change in place it looks better, some traces look meaningful.

> git diff opentelemetry-extra/
diff --git a/opentelemetry-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs b/opentelemetry
-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs
index 77dd6c8..fd4e961 100644
--- a/opentelemetry-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs
+++ b/opentelemetry-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs
@@ -101,7 +101,7 @@ processEvent (Event ts ev m_cap) st@(S {..}) =
         (EndGC, _, _) -> popSpansAcrossAllThreads now st
         (HeapAllocated {allocBytes}, _, Just tid) ->
           (modifySpan tid (addEvent now "heap_alloc_bytes" (showT allocBytes)) st, [])
-        (UserMessage {msg}, _, Just tid) -> case T.words msg of
+        (UserMessage {msg}, _, fromMaybe 0 -> tid) -> case T.words msg of
           ("ot1" : "begin" : "span" : name) ->
             (pushSpan tid (T.intercalate " " name) now st, [])
           ("ot1" : "end" : "span" : _) -> popSpan tid now st
Screen Shot 2020-04-20 at 12 48 31 PM

ethercrow avatar Apr 20 '20 10:04 ethercrow

@ethercrow That is the change I made locally. How did you view that trace? If I search for "serviceName = ghc" then I only see 10 traces but there should probably just be one?

mpickering avatar Apr 20 '20 10:04 mpickering

Perhaps the issue is that there is no span in the log which covers all of the child spans? I will try adding that.

EDIT: This fixes the import. Unfortunatey Zipkin does not deal with visualising how many traces there are well, would be better to make the tracing more fine-grained. In fact, I am not sure what I am supposed to discover from looking at the trace, it seems that I can just view it without doing anything meaningful.

mpickering avatar Apr 20 '20 10:04 mpickering

The UX of all these tools is a problem to me as well. Zipkin and Jaeger I don't get at all and Lightstep is I think the most sane, possibly because I've been using it at work for a year.

ethercrow avatar Apr 20 '20 12:04 ethercrow

ghc-eventlog

ethercrow avatar Apr 20 '20 18:04 ethercrow

@mpickering I just added the eventlog-to-chrome executable that I used to convert your ghc eventlog into a thing above. All the traces are on a single timeline.

ethercrow avatar Apr 20 '20 18:04 ethercrow

ghc-eventlog-tracy

ethercrow avatar Apr 20 '20 18:04 ethercrow

An attempt to instrument stack is here https://github.com/commercialhaskell/stack/pull/5260

ethercrow avatar Apr 24 '20 21:04 ethercrow

And ghcup https://gitlab.haskell.org/haskell/ghcup-hs/-/merge_requests/5

ethercrow avatar May 02 '20 18:05 ethercrow

I've been working on almost the same thing in GHC itself.

The implementation: https://gitlab.haskell.org/broadwaylamb/ghc/-/tree/timetrace-better-metadata

The ghc-devs post: https://mail.haskell.org/pipermail/ghc-devs/2020-May/018827.html

The difference is that you don't need to compile GHC with -eventlog to get it to work, the Chrome Trace Event file can be generated by passing the -ddump-time-trace flag to GHC. Also there is support for merging logs generated by multiple compiler invocations (with this script).

I've also modified the withTiming function a little bit, so that an event's metadata is not encoded in the event's name, but is stored separately, which allows sorting events just by their name. For example, instead of generating events with name Parser [Main], events are generated with the name Parser and metadata "Main", so if we're compiling multiple modules, we can see how much time parsing took overall. Here's how it looks: Screenshot 2020-05-17 at 19 14 10

Because the feedback on the ghc-devs mailing list showed lack of interest in incorporating this feature in GHC as implemented, I'm not going to continue this effort. Feel free to use my code in your implementation if you want.

broadwaylamb avatar May 17 '20 16:05 broadwaylamb