dvc icon indicating copy to clipboard operation
dvc copied to clipboard

dvc exp run: replacing output folder instead of writing

Open ggrrll opened this issue 1 year ago • 3 comments

~Bug Report~ Help on parameters tuning

Description

(not sure this is actually a 🐛 I or I am doing something wrong...)

when I run a stage with dvc exp run --downstream [mystage-name] , the output folder is replaced, although I have changed the parameter values, which is contained in the folder name

in my dvc.yaml file, mystage looks like

  mystage:
    cmd: python myscript.py 
    deps:
    - some_depends
    outs:
    - data/runs-optimization/
    params:
    - mystage.alpha

and myscript.py gets the default parameters from a params.yaml

I understand why , if I set a parameter value which I already used for a previous experiment, then I get Stage 'mystage' is cached - skipping run, checking out outputs

but I still want to analyse the results for all parameters combinations ... at the moment , I just look over my outputs folders, with sub-folders , one for every parameters combination in that case I would have to run the dvc exp run ... to reload the result from cache ... (?)

Reproduce

(I cannot share code or data, as it's proprietary )

Expected

new folder, with new parameters values in its name

I have checked that running the stage 'manually' (python mystage.py) does create the new folder , as expected

Environment information

  • dvc v. 3.53.2
  • Python 3.9.10
  • System Version: macOS 14.1.2

dvc doctor

DVC version: 3.53.2 (pip)
-------------------------
Platform: Python 3.9.10 on macOS-14.1.2-x86_64-i386-64bit
Subprojects:
        dvc_data = 3.15.2
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.7
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)

ggrrll avatar Aug 16 '24 10:08 ggrrll

So you are looking for how you compare different experiment results, correct? With dvc, it is not expected that they live side-by-side in subdirectories. Instead, each is a git commit. Just like you don't make code changes in git by creating a new copy of a file, you don't create a new file or directory for each experiment with dvc. Instead, you can use commands like dvc exp show to compare experiments. Take a look at https://dvc.org/doc/user-guide/experiment-management/comparing-experiments for more details.

dberenbaum avatar Aug 16 '24 13:08 dberenbaum

@dberenbaum thanks for your answer

yes ( and I am trying to follow https://dvc.ai/blog/hyperparam-tuning )

so, in this way , how can I plot summary statistics (of my metrics , over parameters space) ? I can see that I can print the metric value, by adding an evaluation step, like done in https://github.com/iterative/example-get-started/blob/main/dvc.yaml, but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached). As far as I can see, one has to add another stage, maybe accessing the table with the dvc python api , like shown in https://dvc.org/doc/user-guide/experiment-management/comparing-experiments#other-ways-to-access-the-experiments-table (?)

Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in https://campus.datacamp.com/courses/cicd-for-machine-learning/comparing-training-runs-and-hyperparameter-hp-tuning?ex=5

what are our thoughts / suggestions?

thanks

ggrrll avatar Aug 16 '24 14:08 ggrrll

As a workaround / hack you can try to set persist: true for this output. Please read more docs here. It might help to save all the results in a single directory. I don't think it's possible to use it though if you run multiple experiments using a queue in parallel.

In the DVC VS Code extension you could plot multiple experiments with "custom plots". Check the custom plots:

https://github.com/user-attachments/assets/8c52be54-938e-4943-bb12-d332c8dd7974

but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached).

If you need to do custom visualization, I would also check the Get experiments table in Python API in the link that @dberenbaum shared.

Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in

could you clarify please? how does it replace the visualization / comparison part?

shcheklein avatar Aug 19 '24 01:08 shcheklein

hi,

I haven't worked with parameter tuning with DVC since then, so , feel free to close it maybe, for the time being

ggrrll avatar Nov 04 '25 13:11 ggrrll