dvc.org icon indicating copy to clipboard operation
dvc.org copied to clipboard

guide: add "Best Practices"

Open efiop opened this issue 6 years ago • 24 comments

~~UPDATE: Possibly as a How To guide (see #899)~~

Looks like we need a special section describing ways how to organize your projects:

  • [ ] how to use DVC with DB (see https://github.com/iterative/dvc.org/issues/594)
  • ~~our default Dvcfile trick~~
  • [x] manually editing dvc.yaml + dvc commit or dvc repro (see also https://github.com/iterative/dvc.org/issues/230#issuecomment-511769103) it's safe to edit DVC files, no need to touch or update md5, DVC will take care of it UPDATE: See #2578
  • ~~specify meaningful stage names with -f~~
  • [ ] creating a pipeline in a 'debug' directory and then ~~moving it to different data sets~~
  • ~~creating a pipeline in a 'debug' directory and then modifying respective DVC files to~~ set different data sets as an input
  • [x] add "use meta to preserve your content" - #306
  • [ ] never store user credentials in the DVC project config
  • [x] one vs many dvc.yaml files (from https://github.com/iterative/dvc.org/issues/2170#issuecomment-776141460)

See also the latest relevant https://github.com/iterative/dvc.org/issues/72#issuecomment-682868683 and below.

efiop avatar Aug 14 '18 16:08 efiop

Also worth mentioning our default Dvcfile trick.

efiop avatar Aug 23 '18 06:08 efiop

it's safe to edit dvc files, no need to touch or update md5, dvc will take care of it

shcheklein avatar Nov 25 '18 23:11 shcheklein

Specify meaningful stage names

shcheklein avatar Mar 25 '19 06:03 shcheklein

Also, we are not sure about branches anymore.

shcheklein avatar Mar 25 '19 06:03 shcheklein

Never store user credentials in the DVC project config.

shcheklein avatar Mar 25 '19 06:03 shcheklein

@shcheklein we should keep branches - it is a good practice. However, we should mention that for some cases like hyperparameters tuning branches are not very relevant.

dmpetrov avatar Mar 25 '19 06:03 dmpetrov

@dmpetrov agreed, you are right. I just probably wanted to highlight that we should not be pushing branches as a single best option in all case - there are tags, directories, may be even mention other tools for now?

shcheklein avatar Mar 25 '19 06:03 shcheklein

@shcheklein also we need to implement experiment dir\output feature for hyperparameters tuning use case (Stefan's use case).

dmpetrov avatar Mar 25 '19 06:03 dmpetrov

add "use meta to preserve your content" - https://github.com/iterative/dvc.org/issues/306

shcheklein avatar May 10 '19 03:05 shcheklein

HIi @shcheklein. I would like to work on this issue.

Soumya0803 avatar Jun 10 '19 04:06 Soumya0803

@Soumya0803 sure! feel free to write a document for this. Please join our chat dvc.org/chat, we have separate #dev-docs channel if you have any questions.

shcheklein avatar Jun 10 '19 16:06 shcheklein

Is not "Best Practices" the same as "Use Cases"? Maybe we should rename "Use Cases" ==> "Best Practices"

dashohoxha avatar Aug 21 '19 08:08 dashohoxha

@dashohoxha no, it's not the same. "Best Practices" are relatively small tricks and advices you should be using to be efficient with DVC. They are usually general and do not depend on your specific use case.

shcheklein avatar Aug 21 '19 17:08 shcheklein

Should this be merged with #230 and featured in the #899 epic? We're trying to avoid so many sections now.

Also, the Questions part of What is DVC? (currently in https://dvc.org/doc/user-guide/what-is-dvc/collaboration-issues#questions) probably overlaps with this.

jorgeorpinel avatar Jul 15 '20 22:07 jorgeorpinel

@jorgeorpinel Yeah, that indeed seems suitable.

efiop avatar Jul 17 '20 02:07 efiop

@jorgeorpinel how would it looks like? like a subsection in How To?

shcheklein avatar Jul 17 '20 02:07 shcheklein

Just a single document under How To.

I updated the description of this issue and in fact I think #230 is already included here, in the "manually editing dvc.yaml + dvc commit or dvc repro" checkbox.

jorgeorpinel avatar Jul 20 '20 17:07 jorgeorpinel

UPDATES:

Just a single document under How To.

We are currently following this approach in #1705 but I'm not sure it will stick. Maybe Best Practices should be in the form of Explanation (a regular user guide, or directly under Home, even) and not as a How-to (problem-solution format). We'll see...

  • ~~And another best practice to write about is on tracking/versioning compressed archives, composite binaries, even video perhaps (see this support case)~~ - overlaps with #682 though.

Another one (or anti-practice):

  • [ ] Avoid dynamic names (and other non-deterministic behavior — mentioned in dvc run ref). See this support case for context.

jorgeorpinel avatar Aug 28 '20 16:08 jorgeorpinel

@efiop do you think how to: add a page for Managing Experiments #816 would be better as a best practice too? Instead of a how-to as it's requested now. Thanks

jorgeorpinel avatar Nov 20 '20 08:11 jorgeorpinel

It's definitely not How to. It is of the same level as Managing data, etc. Ot my mind section like Managing Experiments should be within Get started, Use Cases, and User Guide at the top level.

shcheklein avatar Nov 20 '20 20:11 shcheklein

More:

  • [ ] how to work with overlapping stage output locations (e.g. hopefully with wildcards in deps/outs soon) — see https://discuss.dvc.org/t/managing-pipelines-operating-per-dataset-element/613/4 for a current alternative.
  • [ ] DVC in Production setup (see https://github.com/iterative/dvc.org/issues/862#issuecomment-848396315)
  • [ ] Add dvc version to your first DVC repo commit? Or another way to know what version(s) you've used (since file formats may change, especially between major versions).

jorgeorpinel avatar Jan 07 '21 03:01 jorgeorpinel

Looking at open check boxes, most or all of these topics are addressed I think (the relevant ones at least).

how to use DVC with DB

This would be a how-to, but is it still something we want to have official docs for? Not really matching DVC's approach

creating a pipeline in a 'debug' directory and then ... set different data sets as an input

@efiop is this some sort of bootstrapping method? Is it really something people do? What problem does it solve?

never store user credentials in the DVC project config

We do stress the use of --local for sensitive configurations (especially in remote modify). Should be enough I believe.

Avoid dynamic names

We mention non-deterministic behavior in general in https://dvc.org/doc/command-reference/stage/add#avoiding-unexpected-behavior and avoiding ad hoc file naming for versioning is a core use case..

jorgeorpinel avatar Aug 05 '22 06:08 jorgeorpinel

all of these topics are addressed

So we could close this ticket. That said we still don't have a "Best Practices" section or guide(s). Do we want to? Maybe Use Cases or the trails of the Get Started already cover this need (informing users of the main/recommended patterns for DVC project setup/usage).

WDYT @shcheklein @dberenbaum ? Thanks

jorgeorpinel avatar Aug 05 '22 06:08 jorgeorpinel

I'm fine to close this. Not all items are covered though and yes, we could have done a good page that contains tips/faq for pipelines, for data, general project structure (e.g. never store user credentials in the DVC project config) ...

shcheklein avatar Aug 06 '22 03:08 shcheklein