dvc icon indicating copy to clipboard operation
dvc copied to clipboard

hydra composition: Workflow discrepancies

Open daavoo opened this issue 3 years ago • 11 comments

When using Hydra Composition for configuring DVC experiments, there are a few discrepancies with respect to the "regular" params workflow. This could cause confusion to existing users of dvc params when migrating to Hydra Composition:

  • The latest state is not preserved

Without Hydra Composition, if modifications are done via --set-param and then the experiment is persisted, the next exp run with no arguments will reuse the latest modifications applied.

With Hydra Composition, the next exp run with no arguments will still run the composition and dump to params.yaml, overriding the latest --set-param modifications.

Users would need to manually edit the files in hydra.config_dir and/or the default list values in hydra.config_name in order to reflect the latest modifications via --set-param.

  • Source of configuration needs to be tracked separately

Without Hydra Composition, tracking the params is enough. A change in params file would result in a new experiment.

With Hydra Composition, tracking only params.yaml could result in unexpected behavior, as manual modifications to files in hydra.config_dir would not be detected by DVC.

Users would need to also track hydra.config_dir.

daavoo avatar Sep 23 '22 17:09 daavoo

Not sure how critical or relevant these 2 points are. Perhaps is only required to acknowledge them in the docs, wdyt @dberenbaum

daavoo avatar Sep 23 '22 17:09 daavoo

  • The latest state is not preserved

Agree that this could be confusing and should be pointed out. However, IMHO this ephemeral state is more expected and useful for experimentation. It pretty much reflects how queued experiments work and how Hydra users work already. The way DVC handles workspace experiments is arguably an odd exception. I find it mostly annoying that I can't return to some default state after each experiment, so I personally wouldn't push to preserve state for Hydra experiments.

  • Source of configuration needs to be tracked separately

Can you think of an example where modifications would not be tracked by DVC? I thought they would all end up being tracked if they impact the experiment (either in params.yaml or in dvc.yaml vars).

dberenbaum avatar Sep 23 '22 19:09 dberenbaum

Encountered a user confused by this behavior in https://discord.com/channels/485586884165107732/563406153334128681/1128718359878451360

dberenbaum avatar Jul 13 '23 13:07 dberenbaum

I'm porting last conversation of Discord here so that we can carry on discussing it on this forum as suggested by @dberenbaum .

I have hydra.enabled = true and I'm familiar with how the params.yaml file is populated from the hydra-defined values when dvc exp run is called. However, after an experiment that was run with some parameters overwritten using the --set-param flag, if I run dvc exp apply <exp-name>, the experiments settings are not applied back to the hydra-defined config files. Therefore if I git / dvc commit the workspace and then later on go back to this same branch and run dvc exp run the experiment that was applied to that branch is no longer reproduced, but an old experiment with the old hydra values is. Would I need to manually update the hydra-defined values after a dvc exp apply command before committing to git/dvc if the experiment was run with --set-param, that seems odd?

gromag avatar Jul 15 '23 08:07 gromag

However, after an experiment that was run with some parameters overwritten using the --set-param flag, if I run dvc exp apply <exp-name>, the experiments settings are not applied back to the hydra-defined config files.

Just to clarify, the experiments settings are applied to your params.yaml file, but they are not populated in the conf directory of files used by the hydra "compose and dump." Therefore, it is possible to reproduce any applied experiment like this:

$ dvc exp apply <exp-name>
$ dvc hydra.enabled = false
$ dvc exp run

It seems odd that you need to disable hydra to reproduce the experiment, but it is consistent with how hydra command-line overrides work. We could at least document this behavior better so that it's clear how it differs from the typical dvc workflow.

dberenbaum avatar Jul 17 '23 13:07 dberenbaum

See https://discord.com/channels/485586884165107732/485596304961962003/1183845968089726976. Suggested there that we could add an option in dvc exp run to disable hydra composition temporarily.

dberenbaum avatar Dec 12 '23 20:12 dberenbaum

@dberenbaum are there any updates on this useful feature?

Danila89 avatar Dec 19 '23 22:12 Danila89

I did take a quick look but it looks more involved to implement than I initially expected. Does the workaround mentioned above not work for you, or you just want a simpler way to do it?

dberenbaum avatar Dec 20 '23 13:12 dberenbaum

I've already implemented a workaround, just thought that I'll be able to replace it with some native way) Not critical for me

Danila89 avatar Dec 20 '23 21:12 Danila89

You can also use dvc repro instead of dvc exp run in this case, which will reproduce the experiment without doing any hydra composition. This is a key difference between repro (intended for reproduction) and exp run (intended to run some modified experiment). I'm not sure it's worth having another way to do this, but we should document these nuances of hydra composition and exp run so expectations are clear.

dberenbaum avatar Dec 21 '23 13:12 dberenbaum