natchez icon indicating copy to clipboard operation
natchez copied to clipboard

Enhancements for LogSpan

Open pauljamescleary opened this issue 4 years ago • 5 comments

For the time being, we are heavily using just LogSpan for our applications (and maybe for a while as things like Jaeger and Honeycomb are a non-starter at the moment). The current LogSpan implementation has some shortcomings that force us to have to write our own. We would prefer to discuss those here and contribute changes back to the project (if desired).

Current Issues

  1. We cannot control the format of the log line - as the json is the only format and the encoding is in the LogSpan.json method. This makes it impossible to have the log output align with our standards
  2. The root log span and all children are output in the same log entry, as opposed to one log entry per span. This complicates log discovery for us, in addition when we have a very long running process with lots of log spans it adds memory pressure and can also delay the time before we see the log entry.
  3. We have the need for doing additional processing of the span other than just logging. E.g. We would like to generate epimetheus metrics to expose error rate, counter, and latency information to Prometheus.

Recommendations

Allow the user to control the processing of the span

Example: allow the user to provide a handler: (LogSpan[F] => ExitCode[Throwable]) => F[Unit] to the LogSpan so they can do anything they want with the span, including logging or metric generation.

Complete each child span immediately, calling release as soon as it is finished

The current finish of a child span looks like the following:

  def finish[F[_]: Sync: Logger]: (LogSpan[F], ExitCase[Throwable]) => F[Unit] = { (span, exitCase) =>
    for {
      n  <- now
      j  <- span.json(n, exitCase)
      _  <- span.parent match {
              case None |
                   Some(Left(_))  => Logger[F].info(Json.fromJsonObject(j).spaces2)
              case Some(Right(s)) => s.children.update(j :: _)
            }
    } yield ()
  }

Instead of explicitly managing children, remove the children all together and call the handler on finish of all spans. The finish could be omitted, and Resource would be sufficient with the handler in scope:

def span(name: String): Resource[F, Span[F]] = {
  Resource.makeCase(createChildHere)(handler).widen
}

pauljamescleary avatar Feb 06 '21 16:02 pauljamescleary

An advantage of LogSpan collecting all spans and only writing on completion of a root span is that the rendered log message has the full span tree. There's no need for a tool to post-process the log output and rebuild the span tree by correlating ids. This is really nice for standalone applications that have relatively short spans. For apps with long running spans, I see the sense in writing each completed span to the log as soon as they complete. That's also useful if debugging hung spans, which otherwise would prevent the entire trace from getting logged.

Re: metrics collection, seems like it'd be cleaner to capture metrics generation as a completely separate EntryPoint and then build a composite EntryPoint out of all the underlying entry points needed. Something like https://github.com/ovotech/effect-utils/blob/master/natchez-combine/src/main/scala/com/ovoenergy/effect/natchez/Combine.scala.

mpilquist avatar Feb 06 '21 18:02 mpilquist

Also, logging each span individually makes it easier to generate metrics and dashboards in Kibana / ELK. Each log entry naturally generates a counter, and each duration maps to latency and the outcome maps to success and error counts.

Another use case is to use Logfmt instead of JSON for logging which is also common.

The use case you pointed out makes sense as well, further substantiating the need to have more control on the logging

pauljamescleary avatar Feb 08 '21 14:02 pauljamescleary

@pauljamescleary I wonder if maybe we need two kinds of loggers, one that accumulates a tree and one that just pumps out spans on the fly. Just in terms of ergonomics this might be easier than trying to generalize. There's some irony in flattening a tree into a list that you later have to reconstruct, but I understand the use case.

@mpilquist the problem with treating Span/EntryPoint as semigroups is that the kernel you extract from a composite span may not be usable by any of the entry points in the composition because you're just adding arbitrary maps together and there's nothing to prevent keys from colliding. Ditto with orElse for spanId … you just get the first one, which isn't necessarily useful. So I have resisted this idea because I can't come up with a good solution.

tpolecat avatar Feb 08 '21 18:02 tpolecat

@tpolecat Do you mean two different "LogSpans" or 2 different "log4cats" loggers?

The existing LogSpan encapsulates the mechanism for creating the log "string", it also encapsulates the mechanism to generate the children, which in this on release of the child span it creates a JsonObject...

private[log] final case class LogSpan[F[_]: Sync: Logger](
  service:   String,
  name:      String,
  sid:       UUID,
  parent:    Option[Either[UUID, LogSpan[F]]],
  traceUUID: UUID,
  timestamp: Instant,
  fields:    Ref[F, Map[String, Json]],
  children:  Ref[F, List[JsonObject]]
) extends Span[F]

Does it make sense to formalize a SpanData type that holds the final values for a span instead of JsonObject? Something like...

final case class SpanData(
  id: UUID,
  parentId: UUID,
  traceId: UUID,
  startTime: Instant,
  endTime: Instant,
  fields: Map[String, TraceValue]
  children: List[SpanData],
  exitCase: String
  error: String // I bailed on thinking out the error structure
)

Thinking on the above as it allows one to do things like LogFmt instead of JSON for logging, in addition to other fun things like pushing values into MDC.

The proposed logger could look something like (spitballing)

private[log] final case class LogSpan[F[_]: Sync: Logger](
  service:   String,
  name:      String,
  sid:       UUID,
  parent:    Option[Either[UUID, LogSpan[F]]],
  traceUUID: UUID,
  timestamp: Instant,
  fields:    Ref[F, Map[String, TraceValue]],
  children:  Ref[F, List[SpanData]],
  finalize: SpanData => F[Unit]
) extends Span[F]

Edit: In the proposal there would be no need for a Logger in scope if the span was closed using the finalize

pauljamescleary avatar Feb 08 '21 22:02 pauljamescleary

Of note, I just found https://github.com/janstenpickle/trace4cats and the RefSpan there has similar features to what I mentioned above. Instead of SpanData trace4cats uses CompletedSpan and instead of a handler function provided as a constructor param it has this SpanCompleter

pauljamescleary avatar Feb 13 '21 17:02 pauljamescleary

Closing but feel free to reopen if there's still interest.

mpilquist avatar Nov 24 '22 14:11 mpilquist