natchez icon indicating copy to clipboard operation
natchez copied to clipboard

Tracing resources

Open rossabaker opened this issue 3 years ago • 55 comments

Given this Client:

  type Client[F[_]] = Request => Resource[F, Response]

We would like to able to span the entire resource, with child spans for acquisition and release:

ambient
+- client
  +- acquire
  +- use
  +- release

Inspired by natchez-http4s and #19, we might try:

    def tracedClient[F[_]: MonadCancelThrow: Trace](client: Client[F]): Client[F] = { req =>
      Resource(Trace[F].span("client") {
        Trace[F].span("acquire")(client(req).allocated).map { case (resp, release) =>
          (resp, Trace[F].span("release")(release))
        }
      })
    }

But the span tree looks like this:

ambient
+- client
| +- acquire
+- use
+- release

We could introduce a more powerful TraceResource type class:

trait TraceResource extends Trace[F] {
  def spanResource(name: String): Resource[F, Unit]
}

Tracing a client becomes:

    def tracedClient[F[_]: MonadCancelThrow: TraceResource](client: Client[F]): Client[F] = { req =>
      TraceResource[F].spanResource("client") >>
      Resource(
        Trace[F].span("acquire")(client(req).allocated).map { case (resp, release) =>
          (resp, Trace[F].span("release")(release))
        }
      )
    }

Executable proof of concept:

$ scala-cli test -S 2.13 https://gist.github.com/rossabaker/8872792b06bd84e5be8fae3c9caf8731

We could avoid the second type class by adding the method to Trace. This would be a breaking change. In addition to the new abstract method, we'd have to mapK the Resource on all the transformed instances, which would require a MonadCancel where there are currently weaker constraints. All the base instances except noop already require MonadCancel, so maybe that's okay.

/cc @zmccoy

rossabaker avatar Feb 13 '22 04:02 rossabaker

I tried adding spanResource to Trace, and the KleisliTrace has sprained my brain. 🤕

rossabaker avatar Feb 17 '22 01:02 rossabaker

You can put yourself in a dark place with this if you use allocated and don't release in reverse order. It's totally fine to create a child span from a finished span, but we lose access to the root span. I'm worried how fs2 rescoping might play here:

  test("weird and misnested") {
    val tracer = new MockTracer
    for {
      root <- IO(MockSpan[IO](tracer, tracer.buildSpan("root").start()))
      _ <- TraceResource.ioTrace(root).flatMap { trace =>
        for {
          a <- trace.resource("a").allocated
          b <- trace.resource("b").allocated
          _ <- a._2
          // the ambient span is root
          _ <- b._2
          // the ambient span is a, which is finished
          c <- trace.resource("c").use_
        } yield ()
      }
      a <- lookupSpan(tracer, "a")
      c <- lookupSpan(tracer, "c")
      _ <- IO(root.span.finish())
      _ = assertEquals(c.parentId, a.context.spanId)
    } yield ()
  }

rossabaker avatar Feb 20 '22 15:02 rossabaker

def resource(name: String): Resource[F, Unit] on ioTrace (except the above footgun) and noop work great. I also tested them on spanning fs2 Streams.

I am struggling with the Resource[Kleisli[F, Span[F], *], Unit] that the KleisliTrace instance would need. This monstrosity spans the client request, but acquire, use, and release are all children of the root span. It could maybe expose the child Span to feed into any kleislis in the continuation to set the "ambient trace", but that's troubling for noop.

    def resource(name: String): Resource[Kleisli[F, Span[F], *], Unit] = {
      val wtf = Kleisli((span: Span[F]) => span.span(name).void).mapF(_.pure[F]).map(_.mapK(Kleisli.liftK[F, Span[F]]))
      Resource.suspend(wtf)
    }

I'm coming to appreciate the existing model more and more, but spanning a resource lifecycle seems pretty fundamental as well.

rossabaker avatar Feb 21 '22 06:02 rossabaker

I got the Kleisli instance working. I had to pass the resource as a continuation, similar to how we pass the effect as a continuation to span:

def resource[A](name: String)(r: Resource[F, A]): Resource[F, A]

The same traced client now spans the entire lifecycle of the request, and works for both Trace[IO] and Trace[Kleisli[F, Span[F], *]]. I don't yet love the API for tracing streams, but it also seems to work. The gist is updated, and now requires a --compiler-plugin org.typelevel:::kind-projector:0.13.2 argument.

rossabaker avatar Feb 22 '22 17:02 rossabaker

I somehow overlooked @kubukoz's prior art in #72 and #73. Jakub twote:

If you dive deeper into the issues you'll see I battled with it before, but didn't consider adding a new typeclass

I don't think we have to introduce a second type class. My gist does, because I wanted a self-contained example. But I think the method could be added to Trace and work for all extant instances. Certainly ioTrace and noop and, horridly, kleisliTrace. I haven't thought through the other transformed instances.

A Trace[Resource[F, *]] and Trace[Stream[F, *]] could both be derived given Trace[F].resource. That could help get rid of the Stream.resource(Trace[F].resource("child")(Resource.unit)) in my gist. The very need for resource is getting into the same issues discussed in "Monadic Regions and Cancellation Semantics" in the original Cats Effect 3.0 proposal, but we didn't get the separate Region type class. The insights there might still inspire a nicer API here.

rossabaker avatar Feb 22 '22 18:02 rossabaker

Just noticed this thread. Didn't have a chance to read through yet, but I've been wanting this too. I made an experimental lib a little while back exploring this idea: https://github.com/armanbilge/bayou/

I couldn't get the Kleisli to work so I need to see how Ross did that :)

armanbilge avatar Feb 22 '22 21:02 armanbilge

I think my Kleisli instance is more complicated than it needs to be, but it took several hours of type tetris to get the one in the gist. There's no Traverse here to save us from the resource being inside the Kleisli when we want it outside the Kleisli.

rossabaker avatar Feb 22 '22 21:02 rossabaker

Before putting up that lib, I remembering experimenting with a signature similar to what you have, in the hope that it could do Stream and stuff as well.

def spanBracket[G[_]: FlatMap, A](f: Resource[F, *] ~> G)(name: String)(k: G[A]): G[A]

I've got to run, but I remember getting tripped up with Kleisli and came to some conclusion that it was only possible to span the acquire and release steps without actually spanning the use. But I could be wrong.

armanbilge avatar Feb 22 '22 22:02 armanbilge

Btw, I think IO-based tracing is better anyway in many cases for the reasons listed in my project's README.

armanbilge avatar Feb 22 '22 22:02 armanbilge

My unit test shows that the use is in the ambient span for use that is the same as acquire and release, even for the Kleisli. But it wasn't easy, and if you release resources in the wrong order with allocated an IO, woe be unto you.

rossabaker avatar Feb 22 '22 22:02 rossabaker

Well I've stared at it for the last hour and am still wrapping my head around it. Took me this long just to figure out the type parameters on the mapF but once I did it helped. In any case, the unit tests are convincing. This seems good.

def resource[A](name: String)(r: Resource[Kleisli[F, Span[F], *], A]): Resource[Kleisli[F, Span[F], *], A] =
  Resource.suspend[Kleisli[F, Span[F], *], A](
    Kleisli[Resource[F, *], Span[F], A]((span: Span[F]) =>
      span.span(name).flatMap(child => r.mapK(Kleisli.applyK(child)))
    ).mapF[F, Resource[Kleisli[F, Span[F], *], A]](ra =>
      ra.mapK[Kleisli[F, Span[F], *]](Kleisli.liftK[F, Span[F]]).pure[F]
    )
  )

armanbilge avatar Feb 23 '22 00:02 armanbilge

Btw, I've been kind of wishing for a natchez 0.2. Resource-tracing + CIString kernels + cats-uri instead of URI.

armanbilge avatar Feb 23 '22 00:02 armanbilge

I hadn't seen Bayou until you mentioned it, but you and @kubukoz and I all had the same basic idea. We all tried to return a Resource[F, Unit] instead of passing an Resource[F, A] to be transformed. I think that's why I eventually succeeded on the Kleisli instance. Returning Resource[Kleisli[F, Span[F]], Unit] lets us span the resource, but I don't see how to make it the ambient span without being passed a Kleisli to receive the child.

I do agree that it's a party trick compared to Trace[IO].resource. The only real value I see is permitting resource to live on the Trace abstraction without introducing a new type class.

rossabaker avatar Feb 23 '22 03:02 rossabaker

We all tried to return a Resource[F, Unit] instead of passing an Resource[F, A] to be transformed.

Actually, I started with something similar to the latter idea as well :) I posted it on Discord but then couldn't figure out how to make it work for Kleisli. The signature was what I posted in https://github.com/tpolecat/natchez/issues/514#issuecomment-1048253542. Edit: actually, if it's possible to implement it for a signature like that maybe it will be less awkward to trace a Stream? Idk.

The only real value I see is permitting resource to live on the Trace abstraction without introducing a new type class.

Actually, I think that's quite a lot of value. Party-trick or not, it's only an implementation detail at the end of the day :)

armanbilge avatar Feb 23 '22 03:02 armanbilge

I overlooked your spanBracket signature somehow. I think you're grasping for the Region type class in the old CE3 design doc. We ended up never getting that type class, which is why you needed the ~>.

Unfortunately, my Kleisli instance doesn't work either. I had an error in my unit test, and the Kleisli returned by use is a child of the root span. It's the same basic problem as the Resource[F, Unit]-returning signature: we don't get to hook into the final Kleisli to apply the child to it.

  test("nested kleislis") {
    val tracer = new MockTracer
    for {
      root <- IO(MockSpan[IO](tracer, tracer.buildSpan("root").start()))
      trace = TraceResource[Kleisli[IO, Span[IO], *]]
      _ <- trace.resource("a") {
        trace.resource("b") {
          Resource.unit
        }
      }.use(_ => trace.span("c")(Kleisli.pure(()))).run(root)
      a <- lookupSpan(tracer, "a")
      b <- lookupSpan(tracer, "b")
      c <- lookupSpan(tracer, "c")
      _ <- IO(root.span.finish())
      _ = assertEquals(b.parentId, a.context.spanId)
      // FAILS HERE
      _ = assertEquals(c.parentId, b.context.spanId)
    } yield ()
  }

We might be back to a TraceResource type class without a Kleisli[F, Span[F], *] instance, which would be super disappointing for improving the tracing middlewares.

rossabaker avatar Feb 23 '22 04:02 rossabaker

Oh darn 😕 still, I think IO-based tracing is mostly good, but not super popular. I haven't thought much about the release-order footgun yet.

armanbilge avatar Feb 23 '22 04:02 armanbilge

I got into IO-based tracing while working on a natchez integration for feral. I think it works fine so long as you are setting this stuff up close to your main method anyway. Here's an example that integrates it with http4s actually:

https://github.com/typelevel/feral/blob/440a62c3bffde6e59db46ab0c95949b25301ced1/examples/src/main/scala/feral/examples/Http4sLambda.scala#L63

armanbilge avatar Feb 23 '22 04:02 armanbilge

Your TracedHandler goes from Span[IO] => IO[Trace[IO]] in the obvious way. I have code that receives an Span[F] for which I'd like to generate a Trace[F] instance do similar (also serverside http4s). I sketched a TraceRoot type class this evening for that so that could can remain tagless, but IO is really the only useful implementation.

I guess we could still have resourceful tracing on the transformers over IO, but that base Kleisli[F, Span[F], *] instance -- the one we sold everybody in CE2 -- is going to get left behind if things like http4s start demanding a type class with a Resource operation. We were so close. 😞

rossabaker avatar Feb 23 '22 05:02 rossabaker

If Trace exposed the current Span itself, I think that would get Kleisli back in action? Edit: it's past my stupid hour, so maybe ignore me. I'm trying to reconcile an old discord thread which discussed passing around Span explicitly as a solution for this problem.

Although I've appreciated that Trace is independent of Span, which I assumed is what you were alluding to above as well.

I'm coming to appreciate the existing model more and more, but spanning a resource lifecycle seems pretty fundamental as well.

armanbilge avatar Feb 23 '22 05:02 armanbilge

@zmccoy and I discussed wishing to be able to get the ambient Span today, with regards to adding support for linking. It would require having a no-op Span in core for Trace.Implciits.noop to provide. But I'm not sure it helps with the Kleisli instance: span doesn't know it's been called inside a Resource, so what could its Kleisli definition be other than ask?

rossabaker avatar Feb 23 '22 05:02 rossabaker

Right, sorry. A true Kleisli instance is definitely out of the question.

If you have direct access to the Span then at least you can manually wrap some resources. There would be no way to pass that Span or its children to inner effects due to the same constraints.

It's a compromise: this would be better than the current situation, where we can't span a Resource or Stream at all, and it would work for any F[_]: Trace, not just IO. But you sacrifice a lot of power.

armanbilge avatar Feb 23 '22 07:02 armanbilge

Manually passing the span seems intrusive to me. We can't trace how we use a resource without changing the very shape of the resource.

  def helper[F[_]: MonadCancelThrow, A](span: Span[F], name: String)(r: Resource[F, A]): Resource[F, (Span[F], A)] = {
    span.span(name).flatMap { case child =>
      Resource(child.span("acquire").use(_ => r.allocated).map { case (a, release) =>
        (child, a) -> child.span("release").use(_ => release)
      })
    }
  }

  def example[F[_]: MonadCancelThrow](span: Span[F]) = {
    helper(span, "resource")(Resource.pure("precious")).use {
      // Use went from `A => F[B]` to `(Span[F], A) => F[B]`   
      case (rSpan, precious) =>
        rSpan.span("use").use_
    }
  }

As soon as we trace a client, every call is going to have that extra value to deal with. Worse, that span won't bear any relation to the ambient span of Trace, if we're mixing and matching.

rossabaker avatar Feb 23 '22 13:02 rossabaker

So far, I haven't had the need for tracing Resource / Stream, though, it would be nice to have it. So forgive me if the problem I'm about to describe belongs somewhere else...

The issue I think it surfaces more to me is tracing HttpRoutes where you have something like this:

final class Routes[F[_]: GenUUID: Monad: Trace](
    users: UsersDB[F]
) extends Http4sDsl[F]

trait UsersDB[F[_]]:
  def get(id: UUID): F[Option[User]]

object UsersDB:
  def make[F[_]: MonadThrow: Trace]: F[UsersDB[F]] = ???

To construct my HttpRoutes, I first need to construct a UsersDB, so I need to satisfy the Trace constraint before the HTTP routes is able to pass its span on every request. So far, my only option has been to instantiate F as Kleisli[F, Span[F], *] from the main class.

Using Trace.ioTrace with an initial root span satisfies the Trace constraint in both as well, but the HttpRoutes span won't be propagated to the UsersDB.

I'm probably repeating myself here since I saw you both already mention the limitations of not being able to get a Trace[F] instance in a tagless world but I might learn a few things from your experience by exposing my issue here.

The same goes for consumer / producer examples, where I would like to resume a trace from a given kernel, can't be done it in an abstract F.


Additionally, natchez-http4s adds an extension method named liftT (and a few others) to magically satisfy the Trace constraint and run it in Kleisli. In a similar way, I created a liftRoutes extension method (only Scala 3 for now, using context fucntions) that runs it in IO instead. E.g.

/* it uses `Trace.ioTrace` under the hood, so only for IO */
def ctxIOLocal(ep: EntryPoint[IO]) =
  for
    db <- Resource.eval(UsersDB.noTrace[IO])
    routes = ep.liftRoutes(Routes[IO](db).routes)
    server = Ember.routes[IO](port"9000", routes)
  yield db -> server

Linking it here in case it is helpful to anyone.

gvolpe avatar Feb 23 '22 14:02 gvolpe

Using Trace.ioTrace with an initial root span satisfies the Trace constraint in both as well, but the HttpRoutes span won't be propagated to the UsersDB.

@gvolpe Thanks for chiming in! Yes, I am acutely aware of this problem as a proponent of IO-based tracing. I proposed a workaround in Bayou and Ross also had an idea for a TraceRoot typeclass that I think might be related, but I'm not sure.

armanbilge avatar Feb 23 '22 15:02 armanbilge

My TraceRoot isn't particularly satisfactory. It's very IO-centric.

trait TraceRoot[F[_]] {
  def trace(root: Span[F]): F[Trace[F]]
}

object TraceRoot {
  def apply[F[_]](implicit ev: TraceRoot[F]): ev.type = ev

  implicit val traceRootForIO: TraceRoot[IO] =
    (root: Span[IO]) => Trace.ioTrace(root)
}

A Kleisli implementation would, like, what, take a Span[Kleisli[F, Span, *]] as input?

rossabaker avatar Feb 23 '22 18:02 rossabaker

Ha, that looks like another one of my ideas in https://github.com/tpolecat/natchez/pull/448.

https://github.com/tpolecat/natchez/pull/448/files#diff-e0aed177ac19bf8e8148a43c9b2252afa6eed98d76c35c483a724e1ad9b53c9c

armanbilge avatar Feb 23 '22 18:02 armanbilge

Using Trace.ioTrace with an initial root span satisfies the Trace constraint in both as well, but the HttpRoutes span won't be propagated to the UsersDB.

Is this suggesting that UsersDB needs to be made per request for its traces to be caused by the routes' traces?

rossabaker avatar Feb 23 '22 19:02 rossabaker

I think so, yes, unless you have a mechanism for injecting a new span on Trace itself.

armanbilge avatar Feb 23 '22 19:02 armanbilge

Is this suggesting that UsersDB needs to be made per request for its traces to be caused by the routes' traces?

@rossabaker not the instance itself (that would be criminal), but the Trace instance needs to be able to pass a unique span on every HTTP request, which is only possible with Kleisli as far as I understand. I ran some tests with this sometime ago, but please correct me if I'm wrong, I can try to test this once again.

gvolpe avatar Feb 23 '22 21:02 gvolpe

Yes, Trace[IO] is constructed with a specific Span, typically a root or continued span from an EntryPoint. You would typically want one instance of this per request. And Trace[Kleisli[F, Span[F], *]] generates functions that you can call repeatedly with different Spans. Arman's IOTrace in Bayou attempts to fix this, but if your code is in Trace[F], there's no abstract way to bring your own span.

So a global Trace[Kleisli[F, Span[F], *]] or a global Bayou IOTrace can be shared between HttpRoutes and UsersDb, but a Trace[IO] from natchez needs to be request-scoped. That then suggests either your UsersDb needs to be request scoped (hopefully without request-scoping any connection pool!), or the Trace[F] constraint needs to be moved to the operations.

rossabaker avatar Feb 24 '22 03:02 rossabaker