dvc.org icon indicating copy to clipboard operation
dvc.org copied to clipboard

clarify pipeline stages vs experiments

Open casperdcl opened this issue 3 years ago • 7 comments

  • [ ] discussion blocked by/depends on https://github.com/iterative/dvc/issues/7866

Some features often underused/misunderstood/unknown could be helped by better docs/messaging/onboarding clarity.

  • Should there be a page clearly describing the difference between stages and experiements?

Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.

It doesn't seem clear to users what's the difference between stage/repro (i.e. pipelines) and exp (i.e. experiments).

  • A feature comparison table would be epic.

casperdcl avatar Jun 09 '22 10:06 casperdcl

I think we're still waiting to see if repro is going to be deprecated in an upcoming release.

Rel https://github.com/iterative/dvc/issues/7866#issuecomment-1151842420

jorgeorpinel avatar Jun 11 '22 17:06 jorgeorpinel

Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.

We do mention exp run vs. repro specifically in several places like https://dvc.org/doc/user-guide/experiment-management/experiments-overview#basic-workflow, https://dvc.org/doc/user-guide/experiment-management/running-experiments#running-the-pipelines, and https://dvc.org/doc/command-reference/exp/run.

jorgeorpinel avatar Jun 11 '22 17:06 jorgeorpinel

None of those links make it remotely clear what the difference is.

The closest near-miss to being potentially helpful is:

đź“– dvc exp run is an experiment-specific alternative to dvc repro.

What are the use cases? When would you use one over another? Are there any examples? Does the description meaningfully reduce a confused user's frustration?

Related to https://stackoverflow.blog/2022/04/25/empathy-for-the-dev-avoiding-common-pitfalls-when-communicating-with-developers/

TL;DR:

  • don't forget the purpose
  • keep in mind the users
    • what do they already know?
    • what problem do they want to solve?
  • focus on how not what: "A common mistake [...] is to describe the what of the interface, instead of the how of a user’s workflow [e.g.] “Click the Confirm Button to confirm” [lol]"
  • have a quick-start guide
  • don’t ship your org chart, ship a solution (instead of categorising into products/features, categorise into use-cases/solutions)

very few users want to be using software. Instead, they want to do the things that software enables. [...] Users don’t want to buy your software, and they don’t want to read your documentation—they just want to have their problems solved

and http://mkremins.github.io/blog/doors-headaches-intellectual-need/

TL;DR:

A hammer (numerous dvc subcommands) seems pointless if you’ve never seen a nail (what are the different problems?)

  • solutions seem pointless if the corresponding problem/purpose isn’t clear… even if the problem is encountered later
  • it’s better to first demonstrate the problem before introducing a solution
  • examples
    • video gamers who find a locked door before finding a key make the logical connection (use key to unlock door) more often than those who find the key first
    • children often hate the (advanced) mathematics taught in school because it often seems pointless
    • functional programming monads are arguably simple, yet newcomers find them difficult… because they try to learn what they are are rather than what they’re for

casperdcl avatar Jun 16 '22 05:06 casperdcl

I think, I missing the point of the question, or I also have some bias.

exp is captured repro. exp enables a higher lever use case of "experiments" on top of some low level building blocks like pipelines (including repro), etc. Do we need a separate command like dvc repro - I don't know. I don't like it personally "aesthetically" (that it's disconnected from dvc stage, that it overlaps with exp, etc). I also don't like dvc run that is hopefully will be replaced finally with dvc stage add. But it feels that some low level "make"-file like interface has its place.

Can I come up with a use case where dvc exp run won't solve the problem? Don't know tbh, feels like no, so again it will be only some aesthetics, or some edge cases. May be some automation, when it's clear that you don't want to deal with some overhead (no matter how small it is) of the dvc exp run. May be we can rename it to dvc stage run --all to make it cleaner.

Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.

the whole point was not to complicate this and not bother users of dvc exp with low level details like dvc repro - why should they care? why do you think it's important for people who come to experiments to know about some strange alternative?

It doesn't seem clear to users what's the difference between stage/repro (i.e. pipelines) and exp (i.e. experiments).

as I mentioned, what you call pipelines is just one of the building blocks for experiments

Should there be a page clearly describing the difference between stages and experiements?

I can only see it from the perspective of a single command (repro vs exp run), what else? stage add does not compete at all with experiments.

shcheklein avatar Jun 16 '22 06:06 shcheklein

In case I wasn't clear earlier: I also wish this topic was clearer, but there's ambiguity in the product itself, and the docs are reflecting that. Deprecating repro or even exp is constantly chattered about, for example. @casperdcl do you have a suggestion on how to clarify this?

exp is captured repro low level "make"-file like interface

I like this. exp builds on top of repro and the latter becomes more of a "helper" (kind of how we expose fetch even when it's part of pull). Good notes for the cmd ref as @shcheklein points out.

why do you think it's important for people who come to experiments to know about some strange alternative?

Yes, we consciously decided not to do this. In fact we have a pending task to remove all or most "pipeline" info from https://dvc.org/doc/user-guide/experiment-management/running-experiments (see https://github.com/iterative/dvc.org/issues/2768).

jorgeorpinel avatar Jun 19 '22 04:06 jorgeorpinel

CLI discussion at https://github.com/iterative/dvc/issues/7866 is a prerequisite to docs.

casperdcl avatar Jun 28 '22 13:06 casperdcl

These two clarification points I've found in various places (the latter one from @SoyGema) have been very useful for me as a user:

  1. Experiments commands exp produce a git ref, that is how it stores its state.
  2. "If you use dvc repro, each time you execute it will overwrite everything without going back unless you commit in between each execution." "dvc exp run allows to run different experiments, for example hyper parameter changes without having to create a commit for each one"

drozzy avatar Sep 01 '22 18:09 drozzy

Some additional feedback.

From @mvshmakov:

We’ve recently discovered that dvc repro is not really suitable for CI if the user wants live experiments in Studio to be enabled. As dvc repro does not create a new experiment, we don’t log params to the Studio, thus the experiment will be displayed only partially.

From https://discord.com/channels/485586884165107732/1065577177007018015/1065630078668648458:

I guess I was confused because when I checked the difference in docu, dvc exp run has the comment "Provides a way to execute and track experiments in your project without polluting it with unnecessary commits, branches, directories, etc." so I thought dvc exp is only "experimental" mode for stuff I don't want to have tracked (which I wanted). A remark about legacy in dvc run docs could be preventing further newbies like me asking stupid questions 🙂

dberenbaum avatar Jan 20 '23 13:01 dberenbaum