Simplify execution of various runs with different params
As an alternative to repeated (manual) executions of:
dvc exp run -S a=1
dvc exp run -S a=2
it might be usefull (and clean, I think) to allow some way to "pass several param files" (from a folder maybe), and this would "auto-queue" runs automatically accordingly.
This is what Facebook's Hydra does, and it really is intuitive and clean.
This is what Facebook's Hydra does, and it really is intuitive and clean.
We are currently exploring different ways to integrate with Hydra. This feature is part of the scope
https://github.com/iterative/dvc/pull/8187 is adding support for using Hydra syntax in --set-param, so:
dvc exp run -S 'a=1,2' --queue
Will put 2 experiments in the queue, that can be later executed with dvc queue start.
My main usecase for the feature request would be to auto-generate such parameter files.
#8187 does not allow this. The parameters are interwoven with the rest of the exp run command.
This would allow to use any algorithm for calculating the concrete parameters, without the need to include all such algorithms in dvc itself.
@behrica Would you be interested in a Python API to do this?
My feeling is that if you need to auto-generate all parameter combinations, you may as well call dvc exp run --queue from your code for each parameter combination (have you tried this as a workaround?). Saving them all to different files in a folder seems sort of against DVC expectations, since it is assumed each experiment contains only its own parameters. It also doesn't seem to work for adaptive algorithms, where it's not known from the start every parameter combination that will be tried.
A Python API could add more parameter combinations as you go. It also adds possibilities to do more complex operations like randomly select a parameter from an interval. Maybe we could support that in a way that is broadly useful across any search algorithm?
I am very happy that DVC is language independent. I use it from Clojure. So I would favor a command line which takes a file with all parameters combinations I want.
Then I could generate such a file from Clojure
But this makes it rather static, indeed.
But the workaround you mentioned is feasible as well.
I think the general question is to decide on this question :
Should dvc itself start to provide various algorithms to "statically calculate" concrete parameters from "a user supplied parameter space" yes/no
It seems to me that #8187 is a first step in this direction. The user gives the space, and dvc calculates all combinations. (taking a random subset of this would be an other algorithm) (using a https://en.wikipedia.org/wiki/Sobol_sequence is an other optimization) Both only take a subset of all combinations or work with continuous intervals and split them smartly.
To allow a "parameter file" would externalize this and allow to keep it out of dvc. But then #8187 should maybe not be merged.
This does not address yet the question of doing this non static using past results of training for example.
I see the "user interface" very similar to #8187
$ dvc exp run -Sfile "param_combinations.csv " --queue # file being in somehow a table format, maybe csv
Queueing with '{'params.yaml': ['db=mysql', 'schema=warehouse']}'.
Queued experiment '5ab98b8' for future execution.
Queueing with '{'params.yaml': ['db=mysql', 'schema=school']}'.
Queued experiment '57c2fb6' for future execution.
Queueing with '{'params.yaml': ['db=postgresql', 'schema=warehouse']}'.
Queued experiment 'b9d6391' for future execution.
Queueing with '{'params.yaml': ['db=postgresql', 'schema=school']}'.
Queued experiment '145cd55' for future execution.
it might be usefull (and clean, I think) to allow some way to "pass several param files" (from a folder maybe)
By the way, this is already possible to do with Hydra. You would save them as YAML files in your conf directory and then select each conf file like dvc exp run -S conf_file=file1,file2. There's a simple example in https://github.com/dberenbaum/hydra-dvc-multirun.
it might be usefull (and clean, I think) to allow some way to "pass several param files" (from a folder maybe)
By the way, this is already possible to do with Hydra. You would save them as YAML files in your
confdirectory and then select each conf file likedvc exp run -S conf_file=file1,file2. There's a simple example in https://github.com/dberenbaum/hydra-dvc-multirun. This syntax does no work for me:
[hydra-dvc-multirun]$ dvc exp run --queue -S conf_file=one.yaml,two.yaml
ERROR: unexpected error - Could not override 'conf_file'.
To append to your config use +conf_file=one.yaml: Key 'conf_file' is not in struct
full_key: conf_file
object_type=dict
dvc exp run --queue -S conf_file=one.yaml,two.yaml
ERROR: unexpected error - Could not override 'conf_file'.
To append to your config use +conf_file=one.yaml: Key 'conf_file' is not in struct
full_key: conf_file
object_type=dict
Sorry, there is some hydra-specific syntax. You have to use group (since that's the dir inside conf where the files are stored), and you can optionally drop .yaml. See the readme of that repo:
$ dvc exp run --queue -S group=one,two
Queueing with overrides '{'params.yaml': ['group=one']}'.
Queued experiment '634a8fa' for future execution.
Queueing with overrides '{'params.yaml': ['group=two']}'.
Queued experiment '0c283dc' for future execution.
I tried it out, and that might work. My use case would be massive grid searches, so I would maybe generate a few thousand files. I could give all of them a random name and list them all in a very long list .... (probable reaching the maximum length of a command line)
I did it now in a complete different way, which is working as well, not using hydra al all.
Basically I loop over all my parameter combinations in code and do:
- write param.yaml to disk
- shell out and run
dvc exp run --queue
This is maybe even good enough for closing this issue.
Makes sense @behrica! Yeah, there are too many different ways to do this to have them all be "built in," but glad you found a pattern that works for you.