opentelemetry-haskell
opentelemetry-haskell copied to clipboard
Instrument some real world projects
Would be nice to find and fix pain points in the process and build a portfolio of instrumentation examples to point users to.
@mpickering do you have some project in mind?
@mpardalos you might also be interested.
@ethercrow I can try on GHC if you tell me how to emit the right traces. If it doesn't involve a library that would be better or I should be able to copy the library into the GHC tree somehow.
The other project it would be useful on would be ghcide but I'll leave that for now..
@mpickering
I think these two files should be enough for GHC
https://github.com/ethercrow/opentelemetry-haskell/blob/master/opentelemetry/src/OpenTelemetry/Eventlog.hs https://github.com/ethercrow/opentelemetry-haskell/blob/master/opentelemetry/src/OpenTelemetry/SpanContext.hs
but you'd have to replace the Control.Monad.Catch.generalBracket with the GHC-specific one.
After you have an eventlog, you can use eventlog-to-zipkin or eventlog-to-lightstep to export it somewhere, the README has some examples.
Maybe hie-bios is a nicer first target since it can depend on exceptions?
I instrumented GHC and it seems there is something broken in the zipkin exporter because each event seems to end up in its own trace.
Also problems with
- Eventlog parsing logic assumes threaded runtime (
Just tidinUserMessagecase) - Zipkin returned 202 response code rather than 200
I will email you the eventlog.

"202 Accepted" is fine, I'll fix the status code check.
Yeah, I haven't thought of non-threaded runtimes yet, I'll take a look, thanks for the eventlog!
@mpickering with this change in place it looks better, some traces look meaningful.
> git diff opentelemetry-extra/
diff --git a/opentelemetry-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs b/opentelemetry
-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs
index 77dd6c8..fd4e961 100644
--- a/opentelemetry-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs
+++ b/opentelemetry-extra/src/OpenTelemetry/EventlogStreaming_Internal.hs
@@ -101,7 +101,7 @@ processEvent (Event ts ev m_cap) st@(S {..}) =
(EndGC, _, _) -> popSpansAcrossAllThreads now st
(HeapAllocated {allocBytes}, _, Just tid) ->
(modifySpan tid (addEvent now "heap_alloc_bytes" (showT allocBytes)) st, [])
- (UserMessage {msg}, _, Just tid) -> case T.words msg of
+ (UserMessage {msg}, _, fromMaybe 0 -> tid) -> case T.words msg of
("ot1" : "begin" : "span" : name) ->
(pushSpan tid (T.intercalate " " name) now st, [])
("ot1" : "end" : "span" : _) -> popSpan tid now st
@ethercrow That is the change I made locally. How did you view that trace? If I search for "serviceName = ghc" then I only see 10 traces but there should probably just be one?
Perhaps the issue is that there is no span in the log which covers all of the child spans? I will try adding that.
EDIT: This fixes the import. Unfortunatey Zipkin does not deal with visualising how many traces there are well, would be better to make the tracing more fine-grained. In fact, I am not sure what I am supposed to discover from looking at the trace, it seems that I can just view it without doing anything meaningful.
The UX of all these tools is a problem to me as well. Zipkin and Jaeger I don't get at all and Lightstep is I think the most sane, possibly because I've been using it at work for a year.

@mpickering I just added the eventlog-to-chrome executable that I used to convert your ghc eventlog into a thing above. All the traces are on a single timeline.

An attempt to instrument stack is here https://github.com/commercialhaskell/stack/pull/5260
And ghcup https://gitlab.haskell.org/haskell/ghcup-hs/-/merge_requests/5
I've been working on almost the same thing in GHC itself.
The implementation: https://gitlab.haskell.org/broadwaylamb/ghc/-/tree/timetrace-better-metadata
The ghc-devs post: https://mail.haskell.org/pipermail/ghc-devs/2020-May/018827.html
The difference is that you don't need to compile GHC with -eventlog to get it to work, the Chrome Trace Event file can be generated by passing the -ddump-time-trace flag to GHC. Also there is support for merging logs generated by multiple compiler invocations (with this script).
I've also modified the withTiming function a little bit, so that an event's metadata is not encoded in the event's name, but is stored separately, which allows sorting events just by their name. For example, instead of generating events with name Parser [Main], events are generated with the name Parser and metadata "Main", so if we're compiling multiple modules, we can see how much time parsing took overall. Here's how it looks:

Because the feedback on the ghc-devs mailing list showed lack of interest in incorporating this feature in GHC as implemented, I'm not going to continue this effort. Feel free to use my code in your implementation if you want.