meltano icon indicating copy to clipboard operation
meltano copied to clipboard

feat: Add a way to simplify cli logging

Open JulesHuisman opened this issue 3 years ago • 6 comments

Simplifying log messages

Currently structured logging can be a bit difficult to read on a local terminal. It is possible to simplify the logging using the logging.yaml, but there is still a lot of extra information that makes it a bit cumbersome to understand what is going on.

As an example in the following log:

{'consumer': False, 'producer': True, 'string_id': 'tap-postmark', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'tap-postmark', 'event': 'time=2022-06-11 19:55:20 name=tap-postmark level=INFO message=Tap has custom mapper. Using 1 provided map(s).', 'level': 'info', 'timestamp': '2022-06-11T19:55:20.971174Z'}

It would be great to be able to only print the message:

Tap has custom mapper. Using 1 provided map(s).

Errors

It is also the case when an error occurs. All the extra information makes it quite difficult to understand the error: image

JulesHuisman avatar Jun 12 '22 10:06 JulesHuisman

@pandemicsyn any quick tips or tricks that could help with this?

tayloramurphy avatar Jun 23 '22 17:06 tayloramurphy

@tayloramurphy no sounds like @JulesHuisman found logging.yaml already and has a sense of what knobs he can turn.

A logger that would only print Tap has custom mapper. Using 1 provided map(s). without other fields probably isn't very useful in many cases. Especially once we introduce parallel runs. If you run meltano run tap mapper target tap2 mapper target2 its handy knowing things like whether a line was produced by "tap1" or "target2". Some of it definitely come down to personal preference. We probably just need to be able to support more formatting/structure options out of the box.

Now that run's been around a while we could also suppress/remove some fields though (e.g. cmd_type).

pandemicsyn avatar Jun 23 '22 18:06 pandemicsyn

@pandemicsyn I totally agree, the way logging is done now is definitely fine as a default.

However, it would be nice to have a place to simplify the logging even further if needed. In our specific use case, we use meltano run to test our own taps during development. This often throws exceptions, which are a bit difficult to following with all the extra information.

But maybe another solution is to just use invoke to avoid all the extra logging. E.g. meltano invoke tap-lightspeed | meltano invoke target-snowflake

JulesHuisman avatar Jun 23 '22 19:06 JulesHuisman

However, it would be nice to have a place to simplify the logging even further if needed.

@JulesHuisman yea we could (should) definitely provide some additional log renders/formatters. Having a bare formatter that just drops all structlog context fields or maybe just a bare minimum for example totally makes sense.

We've got some additional issues floating around regarding exceptions, since they are an interesting case. We can make exceptions raised by meltano a lot more readable and debuggable, but exceptions raised by plugins are bit more challenging.

When a target raises an exception (or even just prints an error) internally to itself, we don't know that, the target's just throwing the traceback on stderr (hopefully) alongside any other log lines it itself might be reporting. On our side its consumed and displayed alongside all the other captured stdio.

Even before we had structured logging the elt command for example signaled whether that log entry it was printing came from stdout/stderr and from a tap/target - because log lines can be interspersed.

Probably not what you're looking for, but one thing that can be pretty handy is logging in json format and then using jq to filter/search exactly for what you want, ala:

<cat meltano.log or meltano command> | jq -c 'select(.string_id == "tap-gitlab" and .stdio == "stderr") | .event'
"INFO Starting sync"
"INFO Skipping stream: projects"
"INFO Skipping stream: branches"
"INFO Skipping stream: issues"
"INFO Skipping stream: jobs"
"INFO Skipping stream: merge_requests"
"INFO Skipping stream: merge_request_commits"
"INFO Skipping stream: project_milestones"
"INFO Skipping stream: group_milestones"
"INFO Skipping stream: users"
"INFO Skipping stream: site_users"
"INFO Skipping stream: groups"
"INFO Skipping stream: project_members"
"INFO Skipping stream: group_members"
"INFO Skipping stream: releases"
"INFO Skipping stream: tags"
"INFO Skipping stream: project_labels"
"INFO Skipping stream: group_labels"
"INFO Skipping stream: epics"
"INFO Skipping stream: epic_issues"
"INFO Skipping stream: pipelines"
"INFO Skipping stream: pipelines_extended"
"INFO Skipping stream: vulnerabilities"
"INFO GET https://gitlab.com/api/v4/projects/meltano%2Fmeltano?statistics=1"
"INFO GET https://gitlab.com/api/v4/projects/7603319/repository/commits?since=2022-04-25T00:00:00Z&with_stats=true"
"INFO Sync complete"

That would give you only the raw message that tap-gitlab printed to stderr.

pandemicsyn avatar Jun 23 '22 21:06 pandemicsyn

@pandemicsyn Thanks for the jq tip, might come in handy.

Having the option to simplify logging further would be great! But the priority of this feature is not that high, we can manage for the time being :)

JulesHuisman avatar Jun 24 '22 09:06 JulesHuisman

Related: #3002

edgarrmondragon avatar Jul 28 '24 01:07 edgarrmondragon