dvc.org
dvc.org copied to clipboard
cmd: clarify that every dvc.yaml is checked upon `dvc.yaml`
UPDATE: Skip to https://github.com/iterative/dvc/issues/5089#issuecomment-744675730
~~Bug~~ Report
Description
repro fails for any file if there's an error in ./dvc.yaml
Reproduce
λ cat dvc.yaml
stages:
hi:
cmd: echo hello
meta:
string: "some text"
λ dvc repro tmp/dvc.yaml
ERROR: 'dvc.yaml' format error: extra keys not allowed @ data['meta']
Even if tmp/dvc.yaml is valid, repro fails because ./dvc.yaml is invalid. AND the message is misleading because "dvc.yaml" here refers to ./dvc.yaml, whereas I'm specifying tmp/dvc.yaml as target (so I would assume that's the one DVC may be referring to).
Expected
Repro of tmp/dvc.yaml (as long as it's valid).
Environment information
Output of dvc version:
λ dvc version
DVC version: 1.10.2 (exe)
---------------------------------
Platform: Python 3.7.9 on Windows-10-10.0.18362-SP0
Supports: All remotes
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
Additional Information (if any):
We build full dag before doing anything to check for cycles/overlaps and so on, so this error is by design. The dvc.yaml is a correct relative path to your current location. We could consider using ./dvc.yaml there instead though, but some might consider that a bug and as something ugly.
tmp/dvc.yaml also has a DAG so why not build that one only? I think that ignoring any file not given as a target is what the user would expect.
The dvc.yaml is a correct relative path... We could consider using ./dvc.yaml
I didn't quite understand those comments. In the example above I specified which path to use (not that one).
tmp/dvc.yaml also has a DAG
@jorgeorpinel, the DAG in the dvc is not explicit. It's built by DVC based on output or dependencies of the stage.
And, as it's implicit, you could very well have a dependency be output in another stage in a different dvc.yaml file. So, a dvc.yaml is not a DAG. That's why we need to build a DAG, from all of the stages in a repo to ensure there are not any duplications or cycles.
In the example above I specified which path to use (not that one).
It's the relative path dvc.yaml that you are seeing which is correct, as the relative path from your working directory has the error, not the tmp/dvc.yaml.
we need to build a DAG, from all of the stages in a repo to ensure there are not any duplications or cycles
I see. Well, that's nice! Having the option to split DAGs in several dvc.yaml files. Although a bit unintuitive but I did double check that an invalid tmp/dvc.yaml causes dvc repro to fail (even if that file is not given as a target).
But this does mean that our docs on repro are incomplete, maybe incorrect... None of this is expressed there.
the relative path dvc.yaml that you are seeing which is correct
Ah I see. I think it would be better to use ./ in that particular case because it's not clear that 'dvc/yaml' is a path. It feels like just a file name, and since I specified tml/dvc.yaml then I assume DVC means that one.
- [ ] https://dvc.org/doc/command-reference/repro should explain this behavior
- [ ] https://dvc.org/doc/user-guide/dvc-files-and-directories could mention that all dvc.yaml files in the project are important and should be valid at all times for DVC to function properly.
@efiop @skshetry @pared I've transferred this to docs but one last concern: I tried a .dvc file as target and it still checked all of the dvc.yaml files in my project for validity first. How is that helpful? There's no way they can affect the .dvc file I specified as target, as it's not a stage (not dependent on any actual stage in dvc.yaml files) but rather just some data tracked by DVC. Thanks
@jorgeorpinel There might be another dvc file that is pointing to the same output path, we check for that and that's why we need to collect everything anyway. This behaviour might change in the future, but for now this is by design.
Overlapping output paths... OK got it 👍