metaflow
metaflow copied to clipboard
Centralised logging
Hey folks, thanks for the great work on Metaflow.
This is a question, not an issue. How do you typically do distributed log tracing with Metaflow? We're deploying our Flows as AWS Step Functions, and we're doing a lot of clicking on AWS Console to get to the logs. And it gets especially tedious when we have SFNs that fan out to 50+ Tasks.
We have a centralised logging server/UI (something like Splunk), and we're shipping the logs produced by our Metaflow SFNs there, but without a single correlation id (e.g. Metaflow run-id) formatted in each log lines, it's not possible to do distributed log tracing.
Keen to hear how people typically solve this problem. Cheers
@davidtan-tw - Depending on how you are shipping logs back, you can access the run ids from the environment of the container. All user logs are also stored in S3 by Metaflow. You can access them as Run('myflow/run-id')['step'].task.stdout
and Run('myflow/run-id')['step'].task.stderr
. You may want to ask the question in chat.metaflow.org where there are quite a few folks who use Splunk with Metaflow.
@davidtan-tw Any follow-ups on this? Please re-open the issue if you would like to chat further.
@davidtan-tw we use structured logging using structlog ( see https://www.structlog.org/en/stable/logging-best-practices.html#canonical-log-lines) to add various ids etc to each logline (but then output only a few log lines)
Thanks @seanv507 . That's what we ended up doing as well. In our log formatter, we added run-id
and also task-id
(in case we need to drill down to logs for one of the specific parallel steps.