metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Centralised logging

Open davidtan-tw opened this issue 2 years ago • 1 comments

Hey folks, thanks for the great work on Metaflow.

This is a question, not an issue. How do you typically do distributed log tracing with Metaflow? We're deploying our Flows as AWS Step Functions, and we're doing a lot of clicking on AWS Console to get to the logs. And it gets especially tedious when we have SFNs that fan out to 50+ Tasks.

We have a centralised logging server/UI (something like Splunk), and we're shipping the logs produced by our Metaflow SFNs there, but without a single correlation id (e.g. Metaflow run-id) formatted in each log lines, it's not possible to do distributed log tracing.

Keen to hear how people typically solve this problem. Cheers

davidtan-tw avatar Jun 07 '22 05:06 davidtan-tw

@davidtan-tw - Depending on how you are shipping logs back, you can access the run ids from the environment of the container. All user logs are also stored in S3 by Metaflow. You can access them as Run('myflow/run-id')['step'].task.stdout and Run('myflow/run-id')['step'].task.stderr. You may want to ask the question in chat.metaflow.org where there are quite a few folks who use Splunk with Metaflow.

savingoyal avatar Jun 09 '22 20:06 savingoyal

@davidtan-tw Any follow-ups on this? Please re-open the issue if you would like to chat further.

savingoyal avatar Aug 30 '22 17:08 savingoyal

@davidtan-tw we use structured logging using structlog ( see https://www.structlog.org/en/stable/logging-best-practices.html#canonical-log-lines) to add various ids etc to each logline (but then output only a few log lines)

seanv507 avatar Dec 04 '22 17:12 seanv507

Thanks @seanv507 . That's what we ended up doing as well. In our log formatter, we added run-id and also task-id (in case we need to drill down to logs for one of the specific parallel steps.

davidtan-tw avatar Dec 04 '22 22:12 davidtan-tw