wandb icon indicating copy to clipboard operation
wandb copied to clipboard

[CLI]: seep agent fails when running sweeps with 'None' values for sweep parameters

Open jpgard opened this issue 2 years ago • 14 comments

Describe the bug

It looks like recent versions of wandb, definitely wandb==0.12.19, fail when one of the sweep parameters is None.

Note that None values are very common, i.e. for hyperparameters of many models in scikit-learn. For example, this sweep uses None for the value of the max_features hyperparameter in (sklearn.ensemble.GradientBoostingClassifier)[https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html]

The affected sweeps all work fine with wandb==0.12.15. The workaround for me was to downgrade to that, for now.

Sorry, can't provide a sweep to reproduce. Create any grid search sweep where one of the hyperparameter values takes a value of 'None'.


2022-07-29 19:24:15,782 - wandb.wandb_agent - ERROR - Exception while processing command: {'run_id': '<redacted>', 'program': 'scripts/train.py', 'type': 'run', 'args': <...> 'max_features': {'value': None},  

......

Traceback (most recent call last):
  File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 299, in _process_command
    result = self._command_run(command)
  File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 409, in _command_run
    sweep_vars: Dict[str, Any] = Agent._create_command_args(command)
  File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 342, in _create_command_args
    raise ValueError('No "value" found for command["args"]["%s"]' % param)
ValueError: No "value" found for command["args"]["max_features"]

Additional Files

No response

Environment

WandB version: 0.12.19

OS: linux

Python version: 3.6.8

Versions of relevant libraries:

Additional Context

No response

jpgard avatar Jul 30 '22 01:07 jpgard

Ramit Goolry commented: Hi @jpgard! Could you share the sweep config you used that generated this result? I'll look into this for you.

exalate-issue-sync[bot] avatar Aug 01 '22 22:08 exalate-issue-sync[bot]

Sure, do you mean the sweep yaml file?

On Monday, August 1, 2022, exalate-issue-sync[bot] @.***> wrote:

Ramit Goolry commented: Hi @jpgard https://github.com/jpgard! Could you share the sweep config you used that generated this result? I'll look into this for you.

— Reply to this email directly, view it on GitHub https://github.com/wandb/wandb/issues/4016#issuecomment-1201816724, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXNZLAWWLNF5DQUTFVEKOLVXBJELANCNFSM55CL4ASA . You are receiving this because you were mentioned.Message ID: @.***>

jpgard avatar Aug 01 '22 23:08 jpgard

Yes

ramit-wandb avatar Aug 05 '22 16:08 ramit-wandb

Ramit Goolry commented: Hi @jpgard,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

exalate-issue-sync[bot] avatar Aug 10 '22 21:08 exalate-issue-sync[bot]

Hi @jpgard, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

ramit-wandb avatar Aug 18 '22 05:08 ramit-wandb

I have the same issue on Wandb 13.1

LeoGrin avatar Aug 21 '22 00:08 LeoGrin

WandB Internal User commented: ramit-wandb commented: Hi @jpgard, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

exalate-issue-sync[bot] avatar Aug 23 '22 18:08 exalate-issue-sync[bot]

WandB Internal User commented: LeoGrin commented: I have the same issue on Wandb 13.1

exalate-issue-sync[bot] avatar Aug 23 '22 18:08 exalate-issue-sync[bot]

This is still an issue.

Sorry, I gave instructions on how to reproduce it in the original comment; I also said I can't provide the sweep file to reproduce -- it's really simple to construct a case where this is raised.

jpgard avatar Aug 23 '22 18:08 jpgard

Hi @jpgard,

I tried reproducing this on my end with a None value, but nothing errored out. It would be really appreciated if you could share a minimal reproduction or more detailed steps on how you reached this error, since we might be missing something here.

Thanks, Ramit

ramit-wandb avatar Aug 31 '22 22:08 ramit-wandb

Ramit Goolry commented: Hi @jpgard,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best, Weights & Biases

exalate-issue-sync[bot] avatar Sep 07 '22 18:09 exalate-issue-sync[bot]

Sorry, I don't have time to create a fully reproducible example, as this would apparently require actually setting up the experiments as well.

fwiw, the line from the sweep yaml file raising the issue looks like the following:

parameters:
  max_features:
    values: [ null, ]

A sweep containing anything like this should raise the error.

jpgard avatar Sep 07 '22 23:09 jpgard

WandB Internal User commented: jpgard commented: Sorry, I don't have time to create a fully reproducible example, as this would apparently require actually setting up the experiments as well.

fwiw, the line from the sweep yaml file raising the issue looks like the following:

parameters:
  max_features:
    values: [ null, ]

A sweep containing anything like this should raise the error.

exalate-issue-sync[bot] avatar Sep 12 '22 21:09 exalate-issue-sync[bot]

Thanks! That's exactly what I needed. I'm going to ticket this out internally right now to be resolved by our engineering team and I will keep you updated on the status of this issue.

Thanks, Ramit

ramit-wandb avatar Sep 13 '22 01:09 ramit-wandb

Bump. Please solve this asap as this makes the sweeps quite impractical

FabianWesth avatar Dec 13 '22 09:12 FabianWesth

Bump. One of my RNN type parameters is 'None'. I have trained 70 models with it manually and if I now want to start a sweep I have to set the parameter in the config to 'null', which is already counterintuitive for the sweep to be able to recognize the models that were trained with 'None'.

sweep config:
method: bayes
metric:
  goal: maximize
  name: val_auc
parameters:
  rnn_type:
    distribution: categorical
    values:
      - null
      - LSTM
      - GRU

Code example:

 if self.params["rnn_type"] != None:
                if self.params["rnn_type"] == "LSTM":
                    rnn = LSTM
                elif self.params["rnn_type"] == "GRU":
                    rnn = GRU
                else:
                    raise ValueError("rnn_type '{}' not supported.".format(self.params["rnn_type"]))
                for x in range(self.params["rnn_num"]-1):
                    self._add_rnn_layer(rnn, True, x)
                self._add_rnn_layer(rnn, False, self.params["rnn_num"]-1)
            else:
                self.cnn = Flatten()(self.cnn)

I turned off the 'None' check in wandb_agent.py to see what is going on (because this None check get triggered when running the sweep):

  for param, config in command["args"].items():
            _value: Any = config.get("value", None)
            #if _value is None:
            #    raise ValueError('No "value" found for command["args"]["%s"]' % param)
            _flag: str = f"{param}={_value}"

now running a sweep produces this error:

Traceback (most recent call last):
  File "/home/profts/P09/scripts/MYOD/train.py", line 77, in <module>
    model, summary = searcher.train(data, verbose=False)
  File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Grid_Search.py", line 82, in train
    model = Model(candidate, data)
  File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Model.py", line 139, in __init__
    self._prepare_model()
  File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Model.py", line 630, in _prepare_model
    raise ValueError("rnn_type '{}' not supported.".format(self.params["rnn_type"]))
ValueError: rnn_type 'None' not supported. 

which shows that the 'None' send by wandb is not the same as the 'None' used in python as it passes self.params["rnn_type"] != None:

This makes it almost impossible to use 'None' as a parameter in python models.

What is really confusing is the fact that somewhere between checking the parameters in wandb and handing the parameters to the training script the 'None' gets changed and I have yet to understand why. Any help would be appreciated.

sproft avatar Jan 06 '23 10:01 sproft

I was able to identify the issue. The problem stems from the fact that the sweep agent is sending the parameter None over the command line. This turns None into a string "None".

sproft avatar Jan 06 '23 13:01 sproft

Another bump. Having None parameter settings is quite common for projects I work on, where None allows the parameter to be automatically set internal to the code. Not being able to sweep across this case and explicit settings is definitely a significant limitation.

rahumble avatar Apr 06 '23 04:04 rahumble

Another bump. This is a very common use case and would be great to have support for it.

benrhodes26 avatar Jun 18 '23 11:06 benrhodes26

Another bump

mmcdermott avatar Aug 18 '23 16:08 mmcdermott

Same issue here! Is there anyone working on this? Or is there any plan to fix this?

shenmishajing avatar Sep 19 '23 02:09 shenmishajing

Hi all, apologies for the delay here! I'm checking internally with our Engineering Team what's the status of this one and also bumping up the urgency. Will keep you posted

luisbergua avatar Sep 21 '23 14:09 luisbergua

Hey all, thanks for your patience! This was fixed here and this will be available in our next SDK release

luisbergua avatar Sep 26 '23 13:09 luisbergua