dvc exp run: replacing output folder instead of writing
~Bug Report~ Help on parameters tuning
Description
(not sure this is actually a 🐛 I or I am doing something wrong...)
when I run a stage with dvc exp run --downstream [mystage-name] , the output folder is replaced, although I have changed the parameter values, which is contained in the folder name
in my dvc.yaml file, mystage looks like
mystage:
cmd: python myscript.py
deps:
- some_depends
outs:
- data/runs-optimization/
params:
- mystage.alpha
and myscript.py gets the default parameters from a params.yaml
I understand why , if I set a parameter value which I already used for a previous experiment, then I get
Stage 'mystage' is cached - skipping run, checking out outputs
but I still want to analyse the results for all parameters combinations ...
at the moment , I just look over my outputs folders, with sub-folders , one for every parameters combination
in that case I would have to run the dvc exp run ... to reload the result from cache ... (?)
Reproduce
(I cannot share code or data, as it's proprietary )
Expected
new folder, with new parameters values in its name
I have checked that running the stage 'manually' (python mystage.py) does create the new folder , as expected
Environment information
- dvc v. 3.53.2
- Python 3.9.10
- System Version: macOS 14.1.2
dvc doctor
DVC version: 3.53.2 (pip)
-------------------------
Platform: Python 3.9.10 on macOS-14.1.2-x86_64-i386-64bit
Subprojects:
dvc_data = 3.15.2
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.7
Supports:
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
So you are looking for how you compare different experiment results, correct? With dvc, it is not expected that they live side-by-side in subdirectories. Instead, each is a git commit. Just like you don't make code changes in git by creating a new copy of a file, you don't create a new file or directory for each experiment with dvc. Instead, you can use commands like dvc exp show to compare experiments. Take a look at https://dvc.org/doc/user-guide/experiment-management/comparing-experiments for more details.
@dberenbaum thanks for your answer
yes ( and I am trying to follow https://dvc.ai/blog/hyperparam-tuning )
so, in this way , how can I plot summary statistics (of my metrics , over parameters space) ? I can see that I can print the metric value, by adding an evaluation step, like done in https://github.com/iterative/example-get-started/blob/main/dvc.yaml, but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached). As far as I can see, one has to add another stage, maybe accessing the table with the dvc python api , like shown in https://dvc.org/doc/user-guide/experiment-management/comparing-experiments#other-ways-to-access-the-experiments-table (?)
Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in https://campus.datacamp.com/courses/cicd-for-machine-learning/comparing-training-runs-and-hyperparameter-hp-tuning?ex=5
what are our thoughts / suggestions?
thanks
As a workaround / hack you can try to set persist: true for this output. Please read more docs here. It might help to save all the results in a single directory. I don't think it's possible to use it though if you run multiple experiments using a queue in parallel.
In the DVC VS Code extension you could plot multiple experiments with "custom plots". Check the custom plots:
https://github.com/user-attachments/assets/8c52be54-938e-4943-bb12-d332c8dd7974
but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached).
If you need to do custom visualization, I would also check the Get experiments table in Python API in the link that @dberenbaum shared.
Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in
could you clarify please? how does it replace the visualization / comparison part?
hi,
I haven't worked with parameter tuning with DVC since then, so , feel free to close it maybe, for the time being