dvc.org icon indicating copy to clipboard operation
dvc.org copied to clipboard

cmd: clarify that every dvc.yaml is checked upon `dvc.yaml`

Open jorgeorpinel opened this issue 4 years ago • 8 comments

UPDATE: Skip to https://github.com/iterative/dvc/issues/5089#issuecomment-744675730

~~Bug~~ Report

Description

repro fails for any file if there's an error in ./dvc.yaml

Reproduce

λ cat dvc.yaml
stages:
  hi:
    cmd: echo hello

meta:
  string: "some text"

λ dvc repro tmp/dvc.yaml
ERROR: 'dvc.yaml' format error: extra keys not allowed @ data['meta']

Even if tmp/dvc.yaml is valid, repro fails because ./dvc.yaml is invalid. AND the message is misleading because "dvc.yaml" here refers to ./dvc.yaml, whereas I'm specifying tmp/dvc.yaml as target (so I would assume that's the one DVC may be referring to).

Expected

Repro of tmp/dvc.yaml (as long as it's valid).

Environment information

Output of dvc version:

λ dvc version
DVC version: 1.10.2 (exe)
---------------------------------
Platform: Python 3.7.9 on Windows-10-10.0.18362-SP0
Supports: All remotes
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git

Additional Information (if any):

jorgeorpinel avatar Dec 13 '20 23:12 jorgeorpinel

We build full dag before doing anything to check for cycles/overlaps and so on, so this error is by design. The dvc.yaml is a correct relative path to your current location. We could consider using ./dvc.yaml there instead though, but some might consider that a bug and as something ugly.

efiop avatar Dec 13 '20 23:12 efiop

tmp/dvc.yaml also has a DAG so why not build that one only? I think that ignoring any file not given as a target is what the user would expect.

The dvc.yaml is a correct relative path... We could consider using ./dvc.yaml

I didn't quite understand those comments. In the example above I specified which path to use (not that one).

jorgeorpinel avatar Dec 14 '20 03:12 jorgeorpinel

tmp/dvc.yaml also has a DAG

@jorgeorpinel, the DAG in the dvc is not explicit. It's built by DVC based on output or dependencies of the stage. And, as it's implicit, you could very well have a dependency be output in another stage in a different dvc.yaml file. So, a dvc.yaml is not a DAG. That's why we need to build a DAG, from all of the stages in a repo to ensure there are not any duplications or cycles.

In the example above I specified which path to use (not that one).

It's the relative path dvc.yaml that you are seeing which is correct, as the relative path from your working directory has the error, not the tmp/dvc.yaml.

skshetry avatar Dec 14 '20 04:12 skshetry

we need to build a DAG, from all of the stages in a repo to ensure there are not any duplications or cycles

I see. Well, that's nice! Having the option to split DAGs in several dvc.yaml files. Although a bit unintuitive but I did double check that an invalid tmp/dvc.yaml causes dvc repro to fail (even if that file is not given as a target).

But this does mean that our docs on repro are incomplete, maybe incorrect... None of this is expressed there.

the relative path dvc.yaml that you are seeing which is correct

Ah I see. I think it would be better to use ./ in that particular case because it's not clear that 'dvc/yaml' is a path. It feels like just a file name, and since I specified tml/dvc.yaml then I assume DVC means that one.

jorgeorpinel avatar Dec 14 '20 19:12 jorgeorpinel

  • [ ] https://dvc.org/doc/command-reference/repro should explain this behavior
  • [ ] https://dvc.org/doc/user-guide/dvc-files-and-directories could mention that all dvc.yaml files in the project are important and should be valid at all times for DVC to function properly.

jorgeorpinel avatar Dec 14 '20 20:12 jorgeorpinel

@efiop @skshetry @pared I've transferred this to docs but one last concern: I tried a .dvc file as target and it still checked all of the dvc.yaml files in my project for validity first. How is that helpful? There's no way they can affect the .dvc file I specified as target, as it's not a stage (not dependent on any actual stage in dvc.yaml files) but rather just some data tracked by DVC. Thanks

jorgeorpinel avatar Dec 15 '20 01:12 jorgeorpinel

@jorgeorpinel There might be another dvc file that is pointing to the same output path, we check for that and that's why we need to collect everything anyway. This behaviour might change in the future, but for now this is by design.

efiop avatar Dec 15 '20 10:12 efiop

Overlapping output paths... OK got it 👍

jorgeorpinel avatar Dec 15 '20 18:12 jorgeorpinel