dagster icon indicating copy to clipboard operation
dagster copied to clipboard

Avoid recommending `context.log` for logs since it is actually emitting events

Open AndreaGiardini opened this issue 2 months ago • 2 comments

Dagster version

1.7.1

What's the issue?

Relevant slack discussion: https://discuss.dagster.io/t/16930496/hey-folks-wave-feedback-from-my-side-regarding-logging-and-e#d5317e86-39ec-4167-ba4e-c435cb3baa21

I have been working with dagster for quite some time and only now I realized that everything that goes into context.log is not actually a "log" but rather an "event". Consequentially that's not stored on S3/GCS but rather in the database.

I think this confusion has caused many people over time to fill up their DB much faster than they were expecting. If you look at Dagster discussions/issues there are many people asking how to remove runs because their DB fills up and I feel this could be one of the reasons. For instance, it's not unusual from some of our Data Scientist to pipe stderr/stdout into context.log.info, which caused our DB to grow massively over time.

The Loggers page in particular ( https://docs.dagster.io/concepts/logging/loggers ) is very misleading as it recommends piping your logs to context.log, which should be avoided if you don't want to end up with lots of records in your database.

What did you expect to happen?

context.log should be used for logs, not events (maybe we should either rename this?)

How to reproduce?

Log anything using context.log and the log file will end up in the database. I would expect log lines to end up in blob storage.

Deployment type

Dagster Helm chart

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

AndreaGiardini avatar Apr 15 '24 08:04 AndreaGiardini

@garethbrickman This should also be marked as "docs" rather than feature-request. The current recommendations for logging are inaccurate

AndreaGiardini avatar Apr 16 '24 09:04 AndreaGiardini

To add to this I think there would be a lot of benefit to adding a documented example of how to properly write logs to stdout/stderr whilst also writing events to the event stream in the same process. This would allow users to write logs to storage and events to the database. It's not completely clear to me how this should be achieved with the dagster logging mechanisms at the moment.

gofford avatar Apr 24 '24 07:04 gofford