dvc stage: params section with variable
Bug Report
Description
I have the following dvc pipeline with the following stage:
stages:
process:
foreach: ${datasets}
do:
cmd: >-
python ${GEN_SCRIPTS_ROOT}/process_ds.py
--ds-root ${DS_ROOT}/${item}/h5-corrected/
--out ${PPL_PTH}/processed/${item}
--config ${PPL_PTH}/config.yaml
--num-workers 4
--buffer-size 4
--force
deps:
- ${GEN_SCRIPTS_ROOT}/process_ds.py
- ${DS_ROOT}/${item}/h5-corrected/train/
- ${DS_ROOT}/${item}/h5-corrected/val/
- ${DS_ROOT}/${item}/h5-corrected/test/
params:
- ${PPL_PTH}/config.yaml:
- processing
outs:
- ${PPL_PTH}/processed/${item}/train/
- ${PPL_PTH}/processed/${item}/val/
- ${PPL_PTH}/processed/${item}/test/
- ${PPL_PTH}/processed/${item}/log.txt
wdir: ${WDIR}
Its compiled version for one of the datasets is:
schema: '2.0'
stages:
process@fractures_0124_seg:
cmd: python ds_gen//process_ds.py --ds-root data/full_datasets//fractures_0124_seg/h5-corrected/
--out pipelines/02_seg//processed/fractures_0124_seg --config pipelines/02_seg//config.yaml
--num-workers 4 --buffer-size 4 --force
deps:
- path: data/full_datasets//fractures_0124_seg/h5-corrected/test/
hash: md5
md5: 7ceeec622eff202ebfd336857c49f6c8.dir
size: 1032293048
nfiles: 4
- path: data/full_datasets//fractures_0124_seg/h5-corrected/train/
hash: md5
md5: 8d889e2240ac8522df681d291d4fe9b1.dir
size: 9504569880
nfiles: 4
- path: data/full_datasets//fractures_0124_seg/h5-corrected/val/
hash: md5
md5: f2dddba6856f9f9b4f6ac07b3c4c3052.dir
size: 925129464
nfiles: 4
- path: ds_gen//process_ds.py
hash: md5
md5: 243575ee6a8718300cb33c54b7f8ddff
size: 1967
params:
pipelines/02_seg/config.yaml:
processing:
Resize:
voxel_size:
k: 2
SpatialResize:
shape:
- 160
- 160
- -1
outs:
- path: pipelines/02_seg//processed/fractures_0124_seg/log.txt
hash: md5
md5: 3ecea9ba483e94e36cd8ac96b5d6ae89
size: 16182
- path: pipelines/02_seg//processed/fractures_0124_seg/test/
hash: md5
md5: 2ec2b5f1e794abe96e6f2c49f0dc3785.dir
size: 126610024
nfiles: 4
- path: pipelines/02_seg//processed/fractures_0124_seg/train/
hash: md5
md5: ca53028b1b457add2ba51edd9ad4174e.dir
size: 873577784
nfiles: 4
- path: pipelines/02_seg//processed/fractures_0124_seg/val/
hash: md5
md5: 721589477a653f1803a83010f379dd90.dir
size: 104532792
nfiles: 4
You can see here, that variables were correctly replaced by real values. But there is a problem:
$ dvc status dvc.yaml:process
process@fractures_0124_seg:
changed deps:
new: config.yaml
and dvc commit --force doesn't help:
$ dvc commit dvc.yaml:process --force
(venv) ermolaev@df783b0a927d:~/projects/radml/cvl-cvisionrad-ml/ribs/pipelines/02_seg$ dvc status dvc.yaml:process
process@fractures_0124_seg:
changed deps:
new: config.yaml
But if I replace
params:
- ${PPL_PTH}/config.yaml:
- processing
with the
params:
- pipelines/02_seg/config.yaml:
- processing
Everything is ok. Note that there is no problem with variables in deps section.
Reproduce
Just create synth pipeline with the template variable in path to some params file.
Expected
I think that DVC should build & compare paths with the same logic for deps and params sections. It looks like DVC doesn't understand that variable in YAML is the same that dvc.lock has.
Environment information
Ubuntu
Output of dvc doctor:
$ dvc doctor
DVC version: 3.53.1 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.15.1
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.6
Supports:
gdrive (pydrive2 = 1.19.0),
http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
Global: /home/ermolaev/.config/dvc
System: /etc/xdg/dvc
Cache types: symlink
Cache directory: ext4 on /dev/sdc1
Caches: local
Remotes: gdrive, gdrive, gdrive, s3
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/7205a6ce3131e59a2db7211a94dd5faa
Additional Information (if any):
So the problem is that ${PPL_PTH}/config.yaml/config.yaml in params is not getting expanded to pipelines/02_seg/config.yaml in dvc status, correct? Is it a problem for other commands as far as you know? Where is ${PPL_PATH} defined?
So the problem is that
${PPL_PTH}/config.yaml/config.yamlinparamsis not getting expanded topipelines/02_seg/config.yamlindvc status, correct? Is it a problem for other commands as far as you know? Where is${PPL_PATH}defined?
I think yes. Command dvc commit -f also not able to catch that no changes are necessary. Parameter ${PPL_PATH} is defined in params.yaml. As I remember definition in vars section doesn't help also.
~@skshetry will know better how the internals work here, but I think DVC loads the parameters first to then fill variables from the dvc.yaml template. So I think it becomes circular to use those variables to read the path to the parameters file. Maybe we should note in the docs that variables cannot be in the params section.~
Sorry, the above looks to be incorrect on further inspection. Can you share the params.yaml? So far, I can't seem to reproduce the issue.