dvc icon indicating copy to clipboard operation
dvc copied to clipboard

Allow naming matrix expansions

Open Kyliroco opened this issue 1 month ago • 3 comments

Allow naming matrix expansions

Description

Summary

  • allow matrix stages to specify an optional name template so each combination gets a stable, user-defined suffix
  • reuse the existing parsing context to render the template and ensure generated names remain unique and valid
  • update the matrix parsing functional tests with generic dataset/model identifiers to cover the new behaviour

Motivation & Use Case

  • When a matrix stage fans out over datasets and models, the autogenerated names (stage@set0_set1…) are opaque for large pipelines.
  • Allowing a template such as "${item.dataset.key}_${item.model.key}" keeps CLI commands (dvc stage list, dvc repro stage@foo_bar) readable and consistent with downstream artefacts.
  • My pipelines rely on predictable stage names for logging and automation, which is why a custom suffix is important.

Example dvc.yaml

vars:
-dataset.yaml
-model.yaml

stages:
  inference:
    matrix:
      dataset: ${datasets_list}
      model: ${models_list}
    name: "${item.dataset.key}_${item.model.key}"
    cmd: >
      inference
      ...

Example dataset.yaml

datasets_list:
-  key: doctamper_testingset
  ...
- key: doctamper_trainingset
  ...

Example model.yaml

model_list:
-  key: ffdn
  ...
- key: trufor
  ...

Replace the call to dvc repro inference@dataset0_model_0 with dvc repro inference@doctamper_testingset_ffdn, which is much better. Link issue

Kyliroco avatar Oct 30 '25 15:10 Kyliroco

Codecov Report

:x: Patch coverage is 84.28571% with 11 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 90.99%. Comparing base (2431ec6) to head (8fcad6e). :warning: Report is 150 commits behind head on main.

Files with missing lines Patch % Lines
dvc/parsing/__init__.py 76.08% 6 Missing and 5 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10903      +/-   ##
==========================================
+ Coverage   90.68%   90.99%   +0.31%     
==========================================
  Files         504      504              
  Lines       39795    41002    +1207     
  Branches     3141     3251     +110     
==========================================
+ Hits        36087    37309    +1222     
- Misses       3042     3051       +9     
+ Partials      666      642      -24     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 30 '25 15:10 codecov[bot]

Thank you for creating this PR. Since this affects dvc.yaml schema/language, I'd prefer you first open an issue or a feature-request that explains your use case, the specific problems or limitations you're currently facing, and why this extension is necessary. I’d also like to understand the level of interest from the community.

Expanding the schema adds maintenance costs. We avoid doing so unless there is a strong demand from the community, has a clear justification, and provides meaningful value.

You can keep this PR open and link it in the new issue, so that others can evaluate and provide feedback.

In addition, please expand the PR description with more detail about the proposal, including example usage, expected behavior, etc.

skshetry avatar Oct 31 '25 07:10 skshetry

Thanks for the quick review! I understand the hesitation about extending the schema. I’ll open a feature request describing our use case, the pain points we’re running into, and why naming matrix entries would help. I’ll also share the link here so the community can weigh in. In the meantime I’ll expand this PR description with examples and expected behaviour to make the proposal clearer.

Kyliroco avatar Oct 31 '25 09:10 Kyliroco