indexify icon indicating copy to clipboard operation
indexify copied to clipboard

Report Usage Information

Open diptanu opened this issue 5 months ago • 4 comments

We need usage information from the executor all the way to the server, so that we can bill customers in Tensorlake cloud and secondly use them for debugging/observability.

We need two types of usage -

  1. Time spent by an FE for running tasks.
  2. Time spent in bringing up the FE. This is because the assumption is that we won't bring up FEs unless there are queued requests. And during FE creation customers code can do undefined amount of work.
  3. There is also the need for tracking the total time an FE has been running for, in case we implement something like min replicas where we would want to bill customers for total time their code is running with or without processing requests.

Thoughts - For (1) the executor reports the amount of time tasks ran when reporting task outcomes, and we track them in the context of an invocation/request For (2) the executor reports time spent when it sends heartbeats and we track these usage at the level of compute graph.

We can think about (3) later.

diptanu avatar Jul 13 '25 21:07 diptanu

3 looks like FE uptime which can be calculated from a single timestamp remembered on fe creation

eabatalov avatar Jul 13 '25 22:07 eabatalov

So I'm kinda thinking we can reuse OpenTelemetry for this - specifically the standard OTEL collector implementation library and the custom collector building tool.

Downside: they're written in Go. (I love Go, but it's another language in our source code base.)

I'm thinking the executor pushes FE time as spans; these become our authoritative billing records. Later, we can combine these with FE-pushed telemetry to build a customer workflow telemetry service.

earhart avatar Jul 15 '25 22:07 earhart

This is semi complete. We are getting function execution duration in the server now, but we don't push them out anywhere. We need to publish these usage metrics to SQS (optionally if configured).

diptanu avatar Aug 01 '25 15:08 diptanu

We also need to track FE initialize RPC duration.

eabatalov avatar Aug 12 '25 14:08 eabatalov