wandb
wandb copied to clipboard
[CLI]: seep agent fails when running sweeps with 'None' values for sweep parameters
Describe the bug
It looks like recent versions of wandb, definitely wandb==0.12.19, fail when one of the sweep parameters is None
.
Note that None
values are very common, i.e. for hyperparameters of many models in scikit-learn. For example, this sweep uses None
for the value of the max_features
hyperparameter in (sklearn.ensemble.GradientBoostingClassifier
)[https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html]
The affected sweeps all work fine with wandb==0.12.15. The workaround for me was to downgrade to that, for now.
Sorry, can't provide a sweep to reproduce. Create any grid search sweep where one of the hyperparameter values takes a value of 'None'.
2022-07-29 19:24:15,782 - wandb.wandb_agent - ERROR - Exception while processing command: {'run_id': '<redacted>', 'program': 'scripts/train.py', 'type': 'run', 'args': <...> 'max_features': {'value': None},
......
Traceback (most recent call last):
File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 299, in _process_command
result = self._command_run(command)
File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 409, in _command_run
sweep_vars: Dict[str, Any] = Agent._create_command_args(command)
File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 342, in _create_command_args
raise ValueError('No "value" found for command["args"]["%s"]' % param)
ValueError: No "value" found for command["args"]["max_features"]
Additional Files
No response
Environment
WandB version: 0.12.19
OS: linux
Python version: 3.6.8
Versions of relevant libraries:
Additional Context
No response
Ramit Goolry commented: Hi @jpgard! Could you share the sweep config you used that generated this result? I'll look into this for you.
Sure, do you mean the sweep yaml file?
On Monday, August 1, 2022, exalate-issue-sync[bot] @.***> wrote:
Ramit Goolry commented: Hi @jpgard https://github.com/jpgard! Could you share the sweep config you used that generated this result? I'll look into this for you.
— Reply to this email directly, view it on GitHub https://github.com/wandb/wandb/issues/4016#issuecomment-1201816724, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXNZLAWWLNF5DQUTFVEKOLVXBJELANCNFSM55CL4ASA . You are receiving this because you were mentioned.Message ID: @.***>
Yes
Ramit Goolry commented: Hi @jpgard,
We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.
Hi @jpgard, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!
I have the same issue on Wandb 13.1
WandB Internal User commented: ramit-wandb commented: Hi @jpgard, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!
WandB Internal User commented: LeoGrin commented: I have the same issue on Wandb 13.1
This is still an issue.
Sorry, I gave instructions on how to reproduce it in the original comment; I also said I can't provide the sweep file to reproduce -- it's really simple to construct a case where this is raised.
Hi @jpgard,
I tried reproducing this on my end with a None
value, but nothing errored out. It would be really appreciated if you could share a minimal reproduction or more detailed steps on how you reached this error, since we might be missing something here.
Thanks, Ramit
Ramit Goolry commented: Hi @jpgard,
We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.
Best, Weights & Biases
Sorry, I don't have time to create a fully reproducible example, as this would apparently require actually setting up the experiments as well.
fwiw, the line from the sweep yaml file raising the issue looks like the following:
parameters:
max_features:
values: [ null, ]
A sweep containing anything like this should raise the error.
WandB Internal User commented: jpgard commented: Sorry, I don't have time to create a fully reproducible example, as this would apparently require actually setting up the experiments as well.
fwiw, the line from the sweep yaml file raising the issue looks like the following:
parameters:
max_features:
values: [ null, ]
A sweep containing anything like this should raise the error.
Thanks! That's exactly what I needed. I'm going to ticket this out internally right now to be resolved by our engineering team and I will keep you updated on the status of this issue.
Thanks, Ramit
Bump. Please solve this asap as this makes the sweeps quite impractical
Bump. One of my RNN type parameters is 'None'. I have trained 70 models with it manually and if I now want to start a sweep I have to set the parameter in the config to 'null', which is already counterintuitive for the sweep to be able to recognize the models that were trained with 'None'.
sweep config:
method: bayes
metric:
goal: maximize
name: val_auc
parameters:
rnn_type:
distribution: categorical
values:
- null
- LSTM
- GRU
Code example:
if self.params["rnn_type"] != None:
if self.params["rnn_type"] == "LSTM":
rnn = LSTM
elif self.params["rnn_type"] == "GRU":
rnn = GRU
else:
raise ValueError("rnn_type '{}' not supported.".format(self.params["rnn_type"]))
for x in range(self.params["rnn_num"]-1):
self._add_rnn_layer(rnn, True, x)
self._add_rnn_layer(rnn, False, self.params["rnn_num"]-1)
else:
self.cnn = Flatten()(self.cnn)
I turned off the 'None' check in wandb_agent.py
to see what is going on (because this None check get triggered when running the sweep):
for param, config in command["args"].items():
_value: Any = config.get("value", None)
#if _value is None:
# raise ValueError('No "value" found for command["args"]["%s"]' % param)
_flag: str = f"{param}={_value}"
now running a sweep produces this error:
Traceback (most recent call last):
File "/home/profts/P09/scripts/MYOD/train.py", line 77, in <module>
model, summary = searcher.train(data, verbose=False)
File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Grid_Search.py", line 82, in train
model = Model(candidate, data)
File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Model.py", line 139, in __init__
self._prepare_model()
File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Model.py", line 630, in _prepare_model
raise ValueError("rnn_type '{}' not supported.".format(self.params["rnn_type"]))
ValueError: rnn_type 'None' not supported.
which shows that the 'None' send by wandb is not the same as the 'None' used in python as it passes self.params["rnn_type"] != None:
This makes it almost impossible to use 'None' as a parameter in python models.
What is really confusing is the fact that somewhere between checking the parameters in wandb and handing the parameters to the training script the 'None' gets changed and I have yet to understand why. Any help would be appreciated.
I was able to identify the issue. The problem stems from the fact that the sweep agent is sending the parameter None over the command line. This turns None into a string "None".
Another bump. Having None
parameter settings is quite common for projects I work on, where None
allows the parameter to be automatically set internal to the code. Not being able to sweep across this case and explicit settings is definitely a significant limitation.
Another bump. This is a very common use case and would be great to have support for it.
Another bump
Same issue here! Is there anyone working on this? Or is there any plan to fix this?
Hi all, apologies for the delay here! I'm checking internally with our Engineering Team what's the status of this one and also bumping up the urgency. Will keep you posted
Hey all, thanks for your patience! This was fixed here and this will be available in our next SDK release