dvc.org icon indicating copy to clipboard operation
dvc.org copied to clipboard

guide: Modeling Pipelines

Open jorgeorpinel opened this issue 4 years ago • 5 comments

From https://github.com/iterative/dvc.org/issues/144#issuecomment-844711839

  • [x] What are data pipelines, including DAG concept (extract from https://dvc.org/doc/command-reference/dag)

Structure:

  • [x] Defining Pipelines
    • [x] Focus on the pipelining process and ~~codification~~ link to dvc.yaml ref. for formalities
    • [x] ~~Defining stages (extract from run and stage refs.)~~
    • [ ] Should provide some actual dvc.yaml example/templates to copy&paste.
    • [ ] Multiple dvc.yaml files vs. multiple pipelines (repro --recursive)
    • [ ] Describing formally outs, deps, stage doesn't make sense here
    • Mention exp init? (Link to appropriate page/section) Maybe
  • [ ] Pipeline reproduction
    • [ ] Extract some details from https://dvc.org/doc/command-reference/repro ?
  • [ ] Experimental pipelines (discussion)
  • [ ] Operationalizing pipelines?

Other tasks

  • [ ] Mention VS Code "as an editor that supports schema definition, etc."
  • [ ] Update some of the [stage] links around this term that currently go to run/stage refs (some should link to the concept page instead)
  • [ ] Recover some info deleted in this change to Running Experiments ?
  • [ ] https://github.com/iterative/dvc.org/issues/3670

jorgeorpinel avatar Oct 01 '21 18:10 jorgeorpinel

Updated plan for this epic per https://github.com/iterative/dvc.org/pull/3414#issuecomment-1238505396. PTAL @dberenbaum @shcheklein

jorgeorpinel avatar Sep 08 '22 01:09 jorgeorpinel

Thanks @jorgeorpinel!

My specific feedback on https://github.com/iterative/dvc.org/pull/3414#issuecomment-1238505396:

We should probably talk more about dvc exp init here? (since it helps to bootstrap the dvc.yaml after all)?

We can mention dvc exp init, but I think it's still immature and we are discussing how to improve this onboarding experience now, so it might end up being very short-lived.

we should provide some example - actual pipepline files?

Yes, I think having some templates that people can copy and paste would help people get started as much as dvc exp init.

mention VS Code as an editor that supports schema definition

Yes, I think we could mention VS Code as one example of an editor and show how these editors can be used to write/modify/validate dvc.yaml files.

Include things like Jupyter notebooks - how to make a pipeline out of it ... etc

Yeah, definitely useful although probably a fairly large project on its own. I know @RCdeWit and @alex000kim are working on something related, so it would be good to sync with them.

dberenbaum avatar Sep 08 '22 17:09 dberenbaum

I think this item Defining stages (extract from run and stage refs.) doesn't make sense anymore.

Operationalizing pipelines Experimental pipelines

I'm not sure what it means

Pipeline reproduction

Running pipeline? - we can include both exp and repro I guess. No need to complicate

so it might end up being very short-lived.

even so, it's fine to mention, i think

shcheklein avatar Sep 13 '22 01:09 shcheklein

having some templates that people can copy and paste mention VS Code

Added to description.

jorgeorpinel avatar Sep 13 '22 01:09 jorgeorpinel

Defining stages (extract from run and stage refs.) doesn't make sense anymore.

Scratched out now (I left it since it's been done in the current doc).

Experimental pipelines

Interaction between exp and dvc.yaml features. The task overlaps with https://github.com/iterative/dvc.org/issues/2768 in that "removing the pipeline and stages concept would be ideal," (from Running Exps).

If it's not a lot it could be a section in Pipeline reproduction.

Operationalizing pipelines

Using pipelines "in production" or some other application. Maybe CML? Probably the last topic to worry about or maybe not even for DVC docs. Thoughts?

jorgeorpinel avatar Sep 13 '22 02:09 jorgeorpinel