Issue with sweeping variables in nested dictionary in config
Something seems wrong with nested parameters. The docs states, that dot is used to define nested variables in config.
This means that defining the following default config:
config = dict(
my_dict = dict(inner_a = 'inner_a')
)
It should be possible to set inner_a to 'inner_a_from_sweep by the sweep:
method: random
metric:
goal: minimize
name: ""
parameters:
my_dict.inner_a:
distribution: categorical
values:
- inner_a_from_sweep
program: train.py
But running this:
config = dict(
a = 'variable_a',
my_dict = dict(inner_a = 'inner_a')
)
wandb.init(config=config)
config = wandb.config
print(config)
Outputs:
wandb_version: 1
_wandb:
desc: null
value:
cli_version: 0.8.32
code_path: code/../.local/bin/wandb
is_jupyter_run: false
is_kaggle_kernel: false
python_version: 3.6.9
my_dict:
desc: null
value:
inner_a: inner_a
my_dict.inner_a:
desc: null
value: inner_a_from_sweep
It seems like there is an issue with converting dot-variables back to dictionaries. I have fixed this myself by (though it is not very pretty and only works on one depth):
def fix_dict_in_config(wandb):
config = dict(wandb.config)
for k, v in config.copy().items():
if '.' in k:
new_key = k.split('.')[0]
inner_key = k.split('.')[1]
if new_key not in config.keys():
config[new_key] = {}
config[new_key].update({inner_key: v})
del config[k]
wandb.config = Config()
for k, v in config.items():
wandb.config[k] = v
Hi there, thanks for this detailed reportβ agreed that this is gnarly. My colleague @raubitsj is working on a CLI rewrite that should really improve problems like this
We currently do not support nested values in sweeps but it is something we are planning on supporting in the near future. We will update this with more details when it is available in beta.
I hit this as well, looking forward to this feature!
Here's a slightly more general of Jgfrausing's code w/ a unittest: https://github.com/EricCousineau-TRI/repro/commit/cc41402
(FTR, can't say I'm a fan of the noise that wandb...Config injects, so that's why I use the simpler AttrDict wrapper.)
This issue is stale because it has been open 60 days with no activity.
Has there been any movement on this?
Not yet but we're actively working on improvements to our sweep infrastructure. cc @dannygoldstein
this is on our roadmap, planning to get to it this quarter.
On Tue, Jul 13, 2021 at 11:56 AM Chris Van Pelt @.***> wrote:
Not yet but we're actively working on improvements to our sweep infrastructure. cc @dannygoldstein https://github.com/dannygoldstein
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wandb/client/issues/982#issuecomment-879323267, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVEFYD5STBIEOULTWQNROTTXSD5HANCNFSM4MMR2CMA .
-- Danny Goldstein http://astro.caltech.edu/~danny
Any news on this issue? A nested config in sweep would be really useful
@yuvalgrossman probably nothing this quarter but this one is on our sweeps improvement roadmap and shouldn't take us too long.
I might be a little late, but the simplest way seems to use the dot notation here to pass a flattened dict from wandb and then unflatten it in the program itself.
This brilliant code snippet from @aryzhov allows in-place flattening and unflattening. I'm posting the code here, full credits to @aryzhov.
def flatten_json(json):
if type(json) == dict:
for k, v in list(json.items()):
if type(v) == dict:
flatten_json(v)
json.pop(k)
for k2, v2 in v.items():
json[k+"."+k2] = v2
def unflatten_json(json):
if type(json) == dict:
for k in sorted(json.keys(), reverse=True):
if "." in k:
key_parts = k.split(".")
json1 = json
for i in range(0, len(key_parts)-1):
k1 = key_parts[i]
if k1 in json1:
json1 = json1[k1]
if type(json1) != dict:
conflicting_key = ".".join(key_parts[0:i+1])
raise Exception('Key "{}" conflicts with key "{}"'.format(
k, conflicting_key))
else:
json2 = dict()
json1[k1] = json2
json1 = json2
if type(json1) == dict:
v = json.pop(k)
json1[key_parts[-1]] = v
The idea is to pass the configs in the yaml as flattened:
a.b.c: 1
a.c.d: 0
When loaded into the wandb.run.config, the config dict obtained is {'a.c.d': 0, 'a.b.c': 1}. Make a copy of this dict and pass it to unflatten_json, to get {'a': {'c': {'d': 0}, 'b': {'c': 1}}}, then use as you would.
Hi wandb folks, is this feature is still on the roadmap? We currently use the flatten / unflatten approach above but it would be nice to just be able initialize wandb and access the config dict directly.
Hi @AnnaKwa, I have reopened this ticketed and notified the engineering team about this request. We will notify this thread with any new updates. Thanks for your patience!
Hi all, thank you all for the feature request! Our engineering team has implemented the ability to sweep over values that are in nested configs and this will be available in release 0.12.17 when it is available.
Thank you, Nate
@nate-wandb @sydholl the PR has been closed w/o merging - the functionality is still not there in 0.12.17 - or even RCs of 0.13.
Would you happen to have an ETA for a solution?
This would be immensely useful. Currently this is one of several problems making WandB sweeps with DDP tricky. To get the whole thing working across multiple processes with a nested config, I store my replaced keys in dot notation e.g.
parameters:
training.arch:
values: [unet-scse, unetplusplus-scse, linknet, deeplabv3plus]
Then update the config with the sweep values, but being careful to stop the non-rank-zero processes from creating new WandB runs:
wandb_logger = WandbLogger(project="HubMAP22",
config=cfg)
# get wandb config for sweeps on non-rank0s, but don't create a new run
if isinstance(wandb_logger.experiment.config, dict):
cfg = update_config(wandb_logger.experiment.config)
else:
wandb.init(config=cfg, mode="disabled")
cfg = update_config(wandb.config)
I can then pass cfg to my LightningModule, but do not run self.save_hyperparameters() in your Module's __init__(), or you'll end up with two entries for each variable in your WandB sweeps interface!
The above update_config() function replaces the entries in the config with dot notation:
def update_config(cfg: dict):
hyper_params = {k: v for k, v in cfg.items() if '.' in k}
hyper_params = unflatten_dot(hyper_params)
update(cfg, hyper_params)
return cfg
def unflatten_dot(dictionary):
resultDict = dict()
for key, value in dictionary.items():
parts = key.split(".")
d = resultDict
for part in parts[:-1]:
if part not in d:
d[part] = dict()
d = d[part]
d[parts[-1]] = value
return resultDict
def update(d, u):
# normal python update method on a dict will overwritten nested siblings that aren't updated
for k, v in u.items():
if isinstance(v, collections.abc.Mapping):
d[k] = update(d.get(k, {}), v)
else:
d[k] = v
return d
Hi @tomaszpietruszka-globality, sorry for the late follow up but this should be implemented now thanks to this PR. Let me know if you still see any issues with this or have any other questions.
Thank you, Nate
The implemented feature does not implement the dot syntax mentioned on the original issue. This is especially problematic as the sweep configs on the web UI (autogenerated from previous runs with nested configs) use the dot syntax, which fails silently until you print and manually check the config dicts.
train.py:
import wandb
default_config = {
'nested': {
'test': 2,
},
'unnested_test': 3,
}
run = wandb.init()
run.config.update(default_config)
print(run.config)
Autogenerated sweep config, at https://wandb.ai/entity/project/create-sweep:
method: bayes
metric:
goal: minimize
name: ""
parameters:
unnested_test:
max: 12
min: 2
distribution: int_uniform
nested.test:
max: 8
min: 2
distribution: int_uniform
Printed line:
{'nested.test': 7, 'unnested_test': 12, 'nested': {'test': 2}}
@marifdemirtas According to the testing code of the PR, the following seems to be the right way to set nested parameters.
method: bayes
metric:
goal: minimize
name: ""
parameters:
nested:
parameters:
test:
max: 8
min: 2
distribution: int_uniform
so you need to specify parameters containing test under the nested instead of nested.test.
I could confirm this works with wandb==0.13.3 (without using the unflatten functions!).
@nate-wandb It would be nice if you (W&B official) could document this clearly because supporting nested parameters is highly useful for many cases π
WandB Internal User commented: kazukiosawa commented: @marifdemirtas According to the testing code of the PR, the following seems to be the right way to set nested parameters.
method: bayes
metric:
goal: minimize
name: ""
parameters:
nested:
parameters:
test:
max: 8
min: 2
distribution: int_uniform
so you need to specify parameters containing test under the nested instead of nested.test.
I could confirm this works with wandb==0.13.3 (without using the unflatten functions!).
@nate-wandb It would be nice if you (W&B official) could document this clearly because supporting nested parameters is highly useful for many cases π
Hi @kazukiosawa, you are correct and thank you for the suggestion. We've since added this to our documentation here under the nested tab. Please let us know if there are any questions around this!
Thank you, Nate
I think with the nested expression we cannot access the child parameters with dot because the parent turns to a dictionary. Any workaround for this? I don't want to use [] expression as I access the child parameters with dot when I don't sweep.
parameters:
nested:
parameters:
test:
max: 8
min: 2
distribution: int_uniform
AttributeError("'dict' object has no attribute 'test'")