dvc.org
dvc.org copied to clipboard
docs: "definitive" organization
UPDATE: Jump to https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760
We need to add an additional level to the user guide:
- Basic
- DVC Files and Directories
- DVC File Format
- External Dependencies
- External Outputs
- Update a Tracked File
- Anonymized Usage Analytics
- Customization
- DVC Shell Autocomplete
- IDE Plugins & Syntax Highlighting
- Development Version
- Contributing
UPDATE: See discussion about structure below, and more subtasks in https://github.com/iterative/dvc.org/issues/144#issuecomment-584441916 further below
@dmpetrov I am looking to contribute to the project Improving and expanding User Guide
in this upcoming Google Season of Docs 2019. I came here just after the organisations were announced, I already have experience with the opensource organisations. You can see my GitHub profile and Gitlab profile: https://gitlab.com/Dhiraj240?nav_source=navbar
May I know on how to get started with this project so that I will be able to submit a proposal during the proposal period. I found your discord channel for communication.
@Dhiraj240 thank you for your interest! Yeah, Discord is a better place to discuss. I'm seeing you have already started.
First, I will work on some issues by this week then after that please guide me. On a weekly basis, I will solve issues. :smile:
Awesome!
The user guide structure has changed quite a bit since this was opened. Is it still desired?
Yes, I think it still relevant. We still don't have a good intermediate structure for the UG. Not sure if it overlaps with some other tickets or duplicates them.
OK. I'd just note that to avoid 4 levels (excessive clicking) we'll need to remove the Contributing submenu and just list both contribution guides inside Customization directly.
@jorgeorpinel don't consider that split in the ticket description as a final one. It's just an example to illustrate the idea.
Other general doc structure changes (each one of these could be a good-first-issue
:
- [x] #425 Absorb Understanding DVC into existing sections
- guide: extract part of
config
cmd ref into the user guide #340 - [x] #1035 merge doc parts into super sections?
- [x] consider https://github.com/iterative/dvc.org/issues/144#issuecomment-559258190 above (avoid 4 levels)
- [x] We should also move the Google API Privacy Policy (https://dvc.org/doc/user-guide/privacy) to another location (not really a user guide) and/or consider having it as a link without nav entry (nor in site footer) ~~implementing #731~~. HIDDEN in #1581
- [x] Some of these movements will require redirects, at least for some time.
Other general doc structure changes:
...
I have created a pull request for absorbing the understanding DVC section. Please take a look at it and let me know if there are any changes that I may need to bring about.
@VANRao-Stack thanks, I will check it out. You are also involved with https://github.com/iterative/dvc.org/issues/614#issuecomment-630561304 though, please pick one issue to focus on for now and let us know.
UPDATE: I know this issue is marked as good-first-issue
and thanks for taking the initiative but I think this is kind of an epic with many subtasks, some of which are good first issues:
- The general reorg of User Guide as described in the description
- The other related tasks listed in https://github.com/iterative/dvc.org/issues/144#issuecomment-584441916
Your PR is a good start but please focus on a single subtask to make this more manageable.
The user guide structure has changed quite a bit since this was opened. Is it still desired?
Yes, I think it still relevant. We still don't have a good intermediate structure for the UG.
So is the intention of this ticket to figure out the most efficient way to organize our current and future docs in a sustainable way? A kind of "ultimate solution" to docs structure? (If so I can update the issue's title and desc.) Cc @dmpetrov
~~I think for that we would need to analyze the website traffic, search results, conversions, etc. to make sure the stuff we put close to the surface is the most needed, and to determine which things can get buried or even hidden (no nav entry), even deleted (left for blog posts and support channels to cover).~~
Here's an interesting framework to consider, brought up by @shcheklein: https://documentation.divio.com/introduction/
documentation needs to include and be structured around its four different functions: tutorials, how-to guides, technical reference and explanation. Each of them requires a distinct mode of writing.
Here's a proposal:
- /docs 🏠 (Home) What is DVC, List of Get Started pages, Related Techs
- cases 💡 Use Cases - may extract to /cases (outside of docs) later rel #820
- guide 📖 User Guide (main container to separate from refs.) - could even remove this level?
- install/ Per OS, Pre-release, Shell completion, Env Tips (IDEs, Win)
- dvc-files/ Metafile Formats and Internals (.dvc/) - maybe reorg into topics below
- data-mgmt/ 🗃️ Data Management
topic
- start-ver Get Started: Data & Model Versioning
- start-access Get Started: Data Access
- dvc-cache/ Cache structure, Run cache, Shared cache
- optimization Optimization & Link types
- external-data #520
- dvc-remotes/ Config refs + setup guides per remote type
- data-pipelines/ 🔃 Data Pipelines
topic
- start-dag Get Started: Data Pipelines (DAG)
- start-metrics-params Get Started: Parameters, Metrics/Plots
- stages
- foreach
- dependencies
- parameters
- outputs
- metrics-plots
- templating (parametric dvc.yaml)
- exp-mgmt 👩🔬 Experiment Management
topic
- start-experiments Get Started: Experiments
- exp-workflow
- checkpoints
- how-to/ How Tos
- contributing/
- troubleshooting
- cli-ref 👨💻 Command Reference
- api-ref 🐍 Python API Reference
Cc @shcheklein @dberenbaum @casperdcl @iesahin
Re: doc-vs-docs, a slight update to my "slight preference for fewer chars" https://github.com/iterative/dvc.org/issues/2443#issuecomment-832808614:
Mathematics -> maths (UK)/math (US). Documentation -> doc (universal), surely? Please don't tell me it's doc (UK)/docs (US) :confused:
I wouldn't nest everything under a "guide" (or whatever name) level - seems like unnecessary user navigational difficulty.
data-mgmt/ 🗃️ Data Management
topic
Is Data Management a good name for the topic? DVC versions and transfers data that goes to ML model. Data management seems like a broader term that might include data sources before it goes to DVC - Data Wherehouse/DB or a directory with "immutable" data / S3. The only scenario when DVC does proper data management is data registry. I don't see any better name so far 😬 Do you have any ideas?
data-pipelines/ 🔃 Data Pipelines
topic
Data Pipeline refers to data engineering tools such as AirFlow. I'd suggest using ML Pipelines that might be also not the best but seems slightly better.
We probably need an additional section on model management that should include all the metrics and plot navigation commands - dvc metrics/plots
. These commands are pretty independent (even should work without dvc.yaml
if a target is specified) and make Git repo metrics-driven. Also, extracting model management into a separate topic will reduce the complexity of ML-pipeline and experiment topics.
exp-mgmt 👩🔬 Experiment Management
topic
Experiment Management - "Experiment Tracking" is another term but I'm not sure which one is the best.
-
IMHO it may be
doc
ordocs
, but if it could be identical with the file path (currently,docs
), I could navigate the [links] with mygf
. -
For data pipelines, another alternative may be model pipelines, as it's usually the end product, or data-model pipelines. (Data in, model out.)
-
Experiment Tracking is better than Management. Another may be Experiment Versioning, or Experiment Version Control, or Experiment Version Tracking.
Documentation -> doc (universal), surely?
@casperdcl it's not universal but also I don't thin it's related to US vs UK. TBH I wasn't proposing to change it, it just came out that way. But a quick check gives me the impression that docs
is more common e.g. terraform.io/docs, docs.aws.amazon.com, developers.google.com/docs, docs.microsoft.com/en-us/azure
wouldn't nest everything under a "guide" (or whatever name) level
I incline the same way but you would need to scroll a lot to find the references without that level so I'm not sure.
@dmpetrov agree to rethink the topic titles. If we can agree on the grouping of content, we can decide that during the PR(s).
additional section on model management that should include all the metrics and plot navigation commands - dvc metrics/plots. These commands are pretty independent ... and make Git repo metrics-driven
Maybe (structure-wise):
- Handling Data (Sets)
- DS/Modeling Pipelines
- ML Model Optimization
- Experiment Mgmt/Tracking
@iesahin with your gf?
I assume he meant gfm
(github flavoured markdown) rather than the usual abbreviation for girlfriend
(I assume that doc/docs, cases
, etc - are not about changing names and URLs, but rather local names here to discuss the structure and refer to the sections faster)
- I don't see get started here. I see it's hidden under UG, but I would not do this. It's better to have it as a top-level section.
cases
are too abstract, it better to start with usual things about the product - how to start, how to install, etc. Happy path pretty much. - Install - we had a quite long discussion and decided that it's good to keep it top level, right?
- Guide should be starting with some overview - basic concepts. Then go into details by specific topics. I agree with @dmpetrov re the names. We should iterate on this. Even though I can see clear data management scenarios besides data registry (e.g. shared cache).
- Agreed on including Model Management that expands on plots/metrics/etc. And this way we won't have to squeeze this into the Data Management by doing somethings like "Data and Model ...."
Overall, this suggestion has a lot of name changes, structural changes that are hard for me to justify. I would start by cleaning this up step by step and by introducing sections in the UG as we go (e.g. when we move shared cache). We'll have more clarity after that and generalize things.
gf
is go to file in vim. 😁 It's easier to navigate the links by having the cursor on link and type gf
. This feature is also available in VS Code markdown plugins I think. We can't navigate the links offline because links are /doc/
but paths are docs/
.
I also think it's a not a big deal. I have other means to navigate.
get started here. I see it's hidden under UG, but I would not do this. It's better to have it as a top-level section.
@shcheklein a) if we remove the UG level (https://github.com/iterative/dvc.org/issues/144#issuecomment-841990301) then they're the first pages under each topic, b) we could repeat those pages in 2 places, c) we can list/link them all directly in /docs home.
I also don't mind keeping a top-level Gs group for now but its structure looks a lot like the proposed top-level (or UG) structure: Data, Pipelines, Models, Experiments, which may make the navigation confusing. Also I expect it will grow even longer and it's starting to look like a full tutorial... But that's a separate issue.
cases are too abstract, it better to start with usual things about the product
I moved them up thinking that we won't even keep them under /docs. But as long as they're in here sure, we can keep them after GS
Install - we had a quite long discussion and decided that it's good to keep it top level, right?
It's in the top if we remove the UG level (UG is an abstract docs container in the proposal).
Guide should be starting with some overview - basic concepts
True, I forgot about Basic Concepts but it doesn't exist yet (there's an issue for that).
this suggestion has a lot of name changes, structural changes that are hard for me to justify.
Really the main proposal would be to eliminate the UG level and regroup most guides into 4 topics instead. The other big change was the redistribution of Get Started entries but it's not needed now. In summary:
/docs Home Install Get Started (Use Cases) Data Management Modeling Pipelines ML Model Optimization Experiment Tracking Cmd Ref API Ref Misc?
wrt https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760:
-
/doc/cases
may extract to
/cases
(outside of docs) later rel #820
I'd say merge with /features
and /
. Complex tutorial-like bits to be extracted to other doc pages.
-
/doc/guide
->/doc
guide book User Guide (main container to separate from refs.) - could even remove this level?
/guide/
is a meaningless level - apparently only exists to collapse irrelevant info for those looking for /doc/api-ref/
and /doc/cli-ref/
. There are other ways to emphasise *-ref
(e.g. italics, bold, etc). Note that https://docs.docker.com has separate roots for "Guides," "Product manuals," "Reference," and "Samples."
-
/doc/dvc-files/
Metafile Formats and Internals (.dvc/) - maybe reorg into topics below
This should appear next to CLI-ref and API-ref because it's at that level of detail. Also should be called "DVC file formats/project directory structure" or something more descriptive.
-
/doc/data-mgmt/
/doc/data-pipelines/
/doc/exp-mgmt/
I agree with the other comments that these are misleading names.
Also presumably these will be slightly more specific that /cases
//features
//
but explicitly NOT tutorials.
-
/doc/how-to/
There are presumably tutorials.
We desperately need to have a list of things which we consider synonymns because otherwise the language barrier seems to be the biggest problem when communicating with each other.
-
/doc/contributing/
Doesn't really have much business being here, I think. Should be in repo CONTRIBUTING.md
/.github/CONTRIBUTING.md
or repo's wiki.
-
/doc/troubleshooting/
This should be at /help
or /support
surely?
@casperdcl thanks. I think we can worry about moving /cases out of docs and about outliers (contrib, troubleshooting) in a 2nd iteration.
Yes, my main proposal is to remove the /guide level. I don't know if it's completely meaningless (the word "guide" gives you a good idea of what you'll find inside) but it's better to reorg into the topics discussed above. And I think we can find the right names as we work on that change, once/if we all agree on this.
The Q on whether /doc/dvc-files is a reference or a guide is also not so simple so I'd let it be for now (same as for where to list /start pages).
So to recap once more: a first reorg iteration would split the Guide into 4 topics. ~~All in favor say aye ✋~~
my main proposal is to remove the /guide level. I don't know if it's completely meaningless
I agree, this is what I meant. More accurately I mean "it's not useful to nest things under a level whatever you call it."
a first reorg iteration would split the Guide into 4 topics
what 4 topics? Can you update the main reference https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760 to make this clear?
Yes, my main proposal is to remove the /guide level
Please, let's not do this. A few reasons from the top of my head:
- we'll be mixing apples and oranges in the top level structure - high level doc-related components like Home, Cmd Ref, etc and actual DVC specific items. The idea behind the existing structure is to reflect 3-4 major parts of any docs - Refs, Guides, Quick Start, etc.
- If we do this - it won't scale well, we'll be adding more stuff there. Where will we put Contrib? Troubleshooting? How to, etc? We'll completely hide other major parts - Cmd Ref, Get started, etc
- not the right time for this? Let's first try to properly consolidate things under Guide and then we can decide?
/guide/ is a meaningless level - apparently only exists to collapse irrelevant info for those looking for
I don't agree. We have never spent enough time to do it right - that's why it looks this way. If we move stuff out, then all docs look like a mess. If we plan on how do we move and instead reorganize inside - it'll look better.
How tos
are technically tutorials, but they have a very precise angle - how to solve a very niche problem, vs general tutorials. You see the difference by their titles.
Let's first try to properly consolidate things under Guide and then we can decide?
OK. So would reorganizing most guides into 4 topics as (sub-levels of /doc/guide) be a good first step? Basic Concepts can stay in the beginning of the Guide section (before the 4 topics). How To and Outlier pages can be after (I think those are found most often from searches or links from other docs).
...
Guide
DVC Concepts ...
Data Management (1) ...
Modeling Pipelines (2) ...
ML Model Optimization (3) ...
Experiment Tracking (4) ...
How to ...
Troubleshooting
Contributing
* Names aren't final
Cc @casperdcl ☝️