dvc icon indicating copy to clipboard operation
dvc copied to clipboard

flatten `deps` to facilitate more flexible templating

Open itcarroll opened this issue 3 years ago • 13 comments

The templating feature currently admits little flexibility for stage keys (e.g. deps) that expect lists. As a result, foreach stages cannot be given a variable number of dependencies. Causing lists to be flattened would make templating more powerful. For example, you could give extra dependencies to certain templated stages:

stages:
  build:
    foreach:
      us:
        deps:
      uk:
        deps:
        - uk_data
    do:
      cmd: myscript ${key}
      deps:
      - myscript
      - ${item.deps}

I have implemented the change at https://github.com/itcarroll/dvc/blob/feature/flattened-deps/dvc/dependency/init.py#L52, by (non-recursive) list flattening in dvc.dependency.loads_from. I am sure it's not the right way to do it, and probably has unintended consequences, but it demonstrates the concept (and no tests failed).

I don't know how templating responds to missing keys, but not having to include the empty list for us would be even more better.

itcarroll avatar Dec 14 '21 17:12 itcarroll

In ancient discussions on foreach I found a suggestion (by @jorgeorpinel):

I wonder actually if you can just do deps: ${list} (a list from params.yaml)

This would add the same flexibility I'm interested in with the above, but it fails YAML validation, which appears to require deps to get a list before ${item} is expanded. So, some change in the validation process might be an alternate route to this feature.

itcarroll avatar Feb 08 '22 04:02 itcarroll

Hi, I'm also interested in this feature https://discord.com/channels/485586884165107732/485596304961962003/946817686220984350

   data:
    foreach:
      librispeech:
        script: libri.sh
        links:
          - https://www.openslr.org/resources/12/dev-clean.tar.gz
          - https://www.openslr.org/resources/12/dev-other.tar.gz
          - https://www.openslr.org/resources/12/test-clean.tar.gz
          - https://www.openslr.org/resources/12/test-other.tar.gz
          - https://www.openslr.org/resources/12/train-clean-100.tar.gz
          - https://www.openslr.org/resources/12/train-clean-360.tar.gz
          - https://www.openslr.org/resources/12/train-other-500.tar.gz
      tedlium:
        script: tedlium.sh
        links:
          - https://www.openslr.org/resources/51/TEDLIUM_release-3.tgz
    do:
      cmd:
        - ./proc/${item.script} --output ${nas_data_dir}/${key} --tmp ${nas_data_dir} ${item.links}
      deps:
        - ${item.script}
        - ${item.links}
      outs:
        - data/${key}/${key}_text
        - data/${key}/${key}_utt2spk
        - data/${key}/${key}_wav.scp
        - data/${key}/${key}_waves

I have stages that have different list lengths

alealv avatar Feb 25 '22 17:02 alealv

So, some change in the validation process might be an alternate route to this feature.

This relates to other requests where templating is being limited by the YAML validation which occurs before the interpolation, for example https://github.com/iterative/dvc/issues/6989

daavoo avatar Aug 23 '22 14:08 daavoo

@daavoo Do you know if these scenarios work fine if not for YAML validation failures?

dberenbaum avatar Aug 23 '22 17:08 dberenbaum

@daavoo Do you know if these scenarios work fine if not for YAML validation failures?

For #6989 yes, for this issue additional work would be needed

daavoo avatar Aug 24 '22 09:08 daavoo

If all the deps are a single list (deps: ${those}, as suggested in #8171), is additional work needed?

dberenbaum avatar Aug 24 '22 19:08 dberenbaum

Big up-vote for this one 👍

It would be really, really helpful.

m-gris avatar Mar 03 '23 14:03 m-gris

Is there any progress on this issue? It really is a much-needed feature.

bezbahen0 avatar Mar 06 '23 11:03 bezbahen0

@bezbahen0 We have not made any progress yet.

@skshetry @daavoo Can we resolve the templating before doing validation? If that's not straightforward, can we skip validation errors triggered by unresolved interpolation?

dberenbaum avatar Mar 06 '23 20:03 dberenbaum

@skshetry @daavoo Can we resolve the templating before doing validation? If that's not straightforward, can we skip validation errors triggered by unresolved interpolation?

Will take a look, it has been a while 😅

daavoo avatar Mar 07 '23 14:03 daavoo

This feature would greatly simplify our current pipeline specification!

JulianoLagana avatar Mar 07 '23 14:03 JulianoLagana

We don't have a good way to validate after templating, say, for example, to determine that deps has to be a list of strings/file-paths after the interpolation.

That's why we limited templated dvc.yaml and the resolved dvc.yaml to follow the same schema.

skshetry avatar Mar 07 '23 14:03 skshetry

Will take a look, it has been a while 😅

I already ended up taking a look and might open a PR for discussion.

dberenbaum avatar Mar 07 '23 14:03 dberenbaum