wandb Issue with sweeping variables in nested dictionary in config

Issue with sweeping variables in nested dictionary in config

Open Jgfrausing opened this issue 4 years ago • 18 comments

Something seems wrong with nested parameters. The docs states, that dot is used to define nested variables in config.

This means that defining the following default config:

config = dict(
    my_dict = dict(inner_a = 'inner_a')
)

It should be possible to set inner_a to 'inner_a_from_sweep by the sweep:

method: random
metric:
  goal: minimize
  name: ""
parameters:
  my_dict.inner_a:
    distribution: categorical
    values:
    - inner_a_from_sweep
program: train.py

But running this:

config = dict(
    a = 'variable_a',
    my_dict = dict(inner_a = 'inner_a')
)

wandb.init(config=config)
config = wandb.config

print(config)

Outputs:

wandb_version: 1

_wandb:
  desc: null
  value:
    cli_version: 0.8.32
    code_path: code/../.local/bin/wandb
    is_jupyter_run: false
    is_kaggle_kernel: false
    python_version: 3.6.9
my_dict:
  desc: null
  value:
    inner_a: inner_a
my_dict.inner_a:
  desc: null
  value: inner_a_from_sweep

It seems like there is an issue with converting dot-variables back to dictionaries. I have fixed this myself by (though it is not very pretty and only works on one depth):

def fix_dict_in_config(wandb):
    config = dict(wandb.config)
    for k, v in config.copy().items():
        if '.' in k:
            new_key = k.split('.')[0]
            inner_key = k.split('.')[1]
            if new_key not in config.keys():
                config[new_key] = {}
            config[new_key].update({inner_key: v})
            del config[k]
    
    wandb.config = Config()
    for k, v in config.items():
        wandb.config[k] = v

Apr 20 '20 15:04 Jgfrausing

Hi there, thanks for this detailed report— agreed that this is gnarly. My colleague @raubitsj is working on a CLI rewrite that should really improve problems like this

May 11 '20 19:05 cvphelps

We currently do not support nested values in sweeps but it is something we are planning on supporting in the near future. We will update this with more details when it is available in beta.

May 11 '20 21:05 raubitsj

I hit this as well, looking forward to this feature!

Here's a slightly more general of Jgfrausing's code w/ a unittest: https://github.com/EricCousineau-TRI/repro/commit/cc41402

(FTR, can't say I'm a fan of the noise that wandb...Config injects, so that's why I use the simpler AttrDict wrapper.)

Jul 02 '20 03:07 EricCousineau-TRI

This issue is stale because it has been open 60 days with no activity.

Dec 20 '20 00:12 github-actions[bot]

Has there been any movement on this?

Jul 13 '21 10:07 KirkHadley

Not yet but we're actively working on improvements to our sweep infrastructure. cc @dannygoldstein

Jul 13 '21 18:07 vanpelt

this is on our roadmap, planning to get to it this quarter.

On Tue, Jul 13, 2021 at 11:56 AM Chris Van Pelt @.***> wrote:

Not yet but we're actively working on improvements to our sweep infrastructure. cc @dannygoldstein https://github.com/dannygoldstein

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wandb/client/issues/982#issuecomment-879323267, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVEFYD5STBIEOULTWQNROTTXSD5HANCNFSM4MMR2CMA .

-- Danny Goldstein http://astro.caltech.edu/~danny

Jul 13 '21 21:07 dannygoldstein

Any news on this issue? A nested config in sweep would be really useful

Aug 16 '21 19:08 yuvalgrossman

@yuvalgrossman probably nothing this quarter but this one is on our sweeps improvement roadmap and shouldn't take us too long.

Nov 19 '21 20:11 aidanjdonohue

I might be a little late, but the simplest way seems to use the dot notation here to pass a flattened dict from wandb and then unflatten it in the program itself.

This brilliant code snippet from @aryzhov allows in-place flattening and unflattening. I'm posting the code here, full credits to @aryzhov.

def flatten_json(json):
    if type(json) == dict:
        for k, v in list(json.items()):
            if type(v) == dict:
                flatten_json(v)
                json.pop(k)
                for k2, v2 in v.items():
                    json[k+"."+k2] = v2

                    
def unflatten_json(json):
    if type(json) == dict:
        for k in sorted(json.keys(), reverse=True):
            if "." in k:
                key_parts = k.split(".")
                json1 = json
                for i in range(0, len(key_parts)-1):
                    k1 = key_parts[i]
                    if k1 in json1:
                        json1 = json1[k1]
                        if type(json1) != dict:
                            conflicting_key = ".".join(key_parts[0:i+1])
                            raise Exception('Key "{}" conflicts with key "{}"'.format(
                                k, conflicting_key))
                    else:
                        json2 = dict()
                        json1[k1] = json2
                        json1 = json2
                if type(json1) == dict:
                    v = json.pop(k)
                    json1[key_parts[-1]] = v

The idea is to pass the configs in the yaml as flattened:

a.b.c: 1
a.c.d: 0

When loaded into the wandb.run.config, the config dict obtained is {'a.c.d': 0, 'a.b.c': 1}. Make a copy of this dict and pass it to unflatten_json, to get {'a': {'c': {'d': 0}, 'b': {'c': 1}}}, then use as you would.

Jan 17 '22 13:01 jaivardhankapoor

Hi wandb folks, is this feature is still on the roadmap? We currently use the flatten / unflatten approach above but it would be nice to just be able initialize wandb and access the config dict directly.

Mar 25 '22 17:03 AnnaKwa

Hi @AnnaKwa, I have reopened this ticketed and notified the engineering team about this request. We will notify this thread with any new updates. Thanks for your patience!

Mar 30 '22 16:03 sydholl

Hi all, thank you all for the feature request! Our engineering team has implemented the ability to sweep over values that are in nested configs and this will be available in release 0.12.17 when it is available.

Thank you, Nate

May 09 '22 12:05 nate-wandb

@nate-wandb @sydholl the PR has been closed w/o merging - the functionality is still not there in 0.12.17 - or even RCs of 0.13.

Would you happen to have an ETA for a solution?

Jun 08 '22 17:06 tomaszpietruszka-globality

This would be immensely useful. Currently this is one of several problems making WandB sweeps with DDP tricky. To get the whole thing working across multiple processes with a nested config, I store my replaced keys in dot notation e.g.

parameters:
  training.arch:
    values: [unet-scse, unetplusplus-scse, linknet, deeplabv3plus]

Then update the config with the sweep values, but being careful to stop the non-rank-zero processes from creating new WandB runs:

    wandb_logger = WandbLogger(project="HubMAP22",
                               config=cfg)

    # get wandb config for sweeps on non-rank0s, but don't create a new run
    if isinstance(wandb_logger.experiment.config, dict):
        cfg = update_config(wandb_logger.experiment.config)
    else:
        wandb.init(config=cfg, mode="disabled")
        cfg = update_config(wandb.config)

I can then pass cfg to my LightningModule, but do not run self.save_hyperparameters() in your Module's __init__(), or you'll end up with two entries for each variable in your WandB sweeps interface!

The above update_config() function replaces the entries in the config with dot notation:

def update_config(cfg: dict):
    hyper_params = {k: v for k, v in cfg.items() if '.' in k}
    hyper_params = unflatten_dot(hyper_params)
    update(cfg, hyper_params)
    return cfg


def unflatten_dot(dictionary):
    resultDict = dict()
    for key, value in dictionary.items():
        parts = key.split(".")
        d = resultDict
        for part in parts[:-1]:
            if part not in d:
                d[part] = dict()
            d = d[part]
        d[parts[-1]] = value
    return resultDict


def update(d, u):
    # normal python update method on a dict will overwritten nested siblings that aren't updated
    for k, v in u.items():
        if isinstance(v, collections.abc.Mapping):
            d[k] = update(d.get(k, {}), v)
        else:
            d[k] = v
    return d

Jun 27 '22 17:06 jphdotam

Hi @tomaszpietruszka-globality, sorry for the late follow up but this should be implemented now thanks to this PR. Let me know if you still see any issues with this or have any other questions.

Thank you, Nate

Aug 15 '22 05:08 nate-wandb

The implemented feature does not implement the dot syntax mentioned on the original issue. This is especially problematic as the sweep configs on the web UI (autogenerated from previous runs with nested configs) use the dot syntax, which fails silently until you print and manually check the config dicts.

train.py:

import wandb
default_config = {
    'nested': {
        'test': 2,
    },
    'unnested_test': 3,
}

run = wandb.init()
run.config.update(default_config)

print(run.config)

Autogenerated sweep config, at https://wandb.ai/entity/project/create-sweep:

method: bayes
metric:
  goal: minimize
  name: ""
parameters:
  unnested_test:
    max: 12
    min: 2
    distribution: int_uniform
  nested.test:
    max: 8
    min: 2
    distribution: int_uniform

Printed line: {'nested.test': 7, 'unnested_test': 12, 'nested': {'test': 2}}

Aug 27 '22 12:08 marifdemirtas

@marifdemirtas According to the testing code of the PR, the following seems to be the right way to set nested parameters.

method: bayes
metric:
  goal: minimize
  name: ""
parameters:
  nested:
    parameters:
     test:
      max: 8
      min: 2
      distribution: int_uniform

so you need to specify parameters containing test under the nested instead of nested.test. I could confirm this works with wandb==0.13.3 (without using the unflatten functions!).

@nate-wandb It would be nice if you (W&B official) could document this clearly because supporting nested parameters is highly useful for many cases 🙏

Sep 11 '22 00:09 kazukiosawa

WandB Internal User commented: kazukiosawa commented: @marifdemirtas According to the testing code of the PR, the following seems to be the right way to set nested parameters.

method: bayes
metric:
  goal: minimize
  name: ""
parameters:
  nested:
    parameters:
     test:
      max: 8
      min: 2
      distribution: int_uniform

so you need to specify parameters containing test under the nested instead of nested.test. I could confirm this works with wandb==0.13.3 (without using the unflatten functions!).

@nate-wandb It would be nice if you (W&B official) could document this clearly because supporting nested parameters is highly useful for many cases 🙏

Sep 28 '22 00:09 exalate-issue-sync[bot]

Hi @kazukiosawa, you are correct and thank you for the suggestion. We've since added this to our documentation here under the nested tab. Please let us know if there are any questions around this!

Thank you, Nate

Nov 09 '22 01:11 nate-wandb

I think with the nested expression we cannot access the child parameters with dot because the parent turns to a dictionary. Any workaround for this? I don't want to use [] expression as I access the child parameters with dot when I don't sweep.

parameters:
  nested:
    parameters:
     test:
      max: 8
      min: 2
      distribution: int_uniform

AttributeError("'dict' object has no attribute 'test'")

Dec 06 '23 05:12 SeanNobel

wandb wandb copied to clipboard

Issue with sweeping variables in nested dictionary in config

wandb
wandb copied to clipboard