Epic: improve experiment logs

Open dberenbaum opened this issue 2 years ago • 1 comments

Summary / Background

Provide robust logging for experiment runs.

Scope

When running any experiment, save logs of the output, errors, hardware usage, time ,etc. Be able to retrieve this anytime/anywhere for any experiment, including sharing between users and product (DVC, VS Code, Studio).

Assumptions

Only for pipeline execution (not about dvclive-only experiments)

Open Questions

How do we share the logs?
Should we share live log updates to Studio?

Blockers / Dependencies

Can we make it a joint effort with VS Code and Studio teams? Seems like it would be powerful in Studio for workflows like cloud experiments.

General Approach

We already have dvc queue logs. For sharing, we could add dvc queue push/pull or support dvc push/pull --logs

Steps

Phase 1: Make logging work for all experiments

[x] https://github.com/iterative/dvc/issues/9425
[ ] https://github.com/iterative/dvc/issues/9616
[ ] https://github.com/iterative/dvc/issues/9174
[ ] https://github.com/iterative/dvc/issues/8658
[ ] https://github.com/iterative/dvc/issues/9079

Phase 2: Expand and share logs

[ ] https://github.com/iterative/dvc/issues/8483
[ ] Time each stage took to execute
[ ] Hardware usage and type - number of CPUs/GPUs and their usage, same with memory

Timelines

TBD (not yet prioritized)

May 11 '23 14:05 dberenbaum

Discussed in #9425 that the current dvc queue logs command won't make sense if we want to capture logs for non-queued experiments. Now that we have dropped checkpoints, do we still need a separate queue command or can we merge it with exp?

Looking through the current queue commands:

start: could be in exp run --run-all or exp start
stop: could be in exp stop
status: is it needed? if so, can it be in exp status?
logs: could be in exp logs
remove: is it needed? this also might depend on whether/how we plan to preserve the logs; some info could be auto-deleted on exp clean
kill: could be in exp kill

Jun 16 '23 11:06 dberenbaum