guide: Modeling Pipelines
From https://github.com/iterative/dvc.org/issues/144#issuecomment-844711839
- [x] What are data pipelines, including DAG concept (extract from https://dvc.org/doc/command-reference/dag)
Structure:
- [x] Defining Pipelines
- [x] Focus on the pipelining process and ~~codification~~ link to dvc.yaml ref. for formalities
- [x] ~~Defining stages (extract from
runandstagerefs.)~~ - [ ] Should provide some actual
dvc.yamlexample/templates to copy&paste. - [ ] Multiple dvc.yaml files vs. multiple pipelines (
repro --recursive) - [ ] Describing formally outs, deps, stage doesn't make sense here
- Mention
exp init? (Link to appropriate page/section) Maybe
- [ ] Pipeline reproduction
- [ ] Extract some details from https://dvc.org/doc/command-reference/repro ?
- [ ] Experimental pipelines (discussion)
- [ ] Operationalizing pipelines?
Other tasks
- [ ] Mention VS Code "as an editor that supports schema definition, etc."
- [ ] Update some of the [stage] links around this term that currently go to
run/stagerefs (some should link to the concept page instead) - [ ] Recover some info deleted in this change to Running Experiments ?
- [ ] https://github.com/iterative/dvc.org/issues/3670
Updated plan for this epic per https://github.com/iterative/dvc.org/pull/3414#issuecomment-1238505396. PTAL @dberenbaum @shcheklein
Thanks @jorgeorpinel!
My specific feedback on https://github.com/iterative/dvc.org/pull/3414#issuecomment-1238505396:
We should probably talk more about dvc exp init here? (since it helps to bootstrap the dvc.yaml after all)?
We can mention dvc exp init, but I think it's still immature and we are discussing how to improve this onboarding experience now, so it might end up being very short-lived.
we should provide some example - actual pipepline files?
Yes, I think having some templates that people can copy and paste would help people get started as much as dvc exp init.
mention VS Code as an editor that supports schema definition
Yes, I think we could mention VS Code as one example of an editor and show how these editors can be used to write/modify/validate dvc.yaml files.
Include things like Jupyter notebooks - how to make a pipeline out of it ... etc
Yeah, definitely useful although probably a fairly large project on its own. I know @RCdeWit and @alex000kim are working on something related, so it would be good to sync with them.
I think this item Defining stages (extract from run and stage refs.) doesn't make sense anymore.
Operationalizing pipelines Experimental pipelines
I'm not sure what it means
Pipeline reproduction
Running pipeline? - we can include both exp and repro I guess. No need to complicate
so it might end up being very short-lived.
even so, it's fine to mention, i think
having some templates that people can copy and paste mention VS Code
Added to description.
Defining stages (extract from run and stage refs.) doesn't make sense anymore.
Scratched out now (I left it since it's been done in the current doc).
Experimental pipelines
Interaction between exp and dvc.yaml features. The task overlaps with https://github.com/iterative/dvc.org/issues/2768 in that "removing the pipeline and stages concept would be ideal," (from Running Exps).
If it's not a lot it could be a section in Pipeline reproduction.
Operationalizing pipelines
Using pipelines "in production" or some other application. Maybe CML? Probably the last topic to worry about or maybe not even for DVC docs. Thoughts?