hydra-zen `hydrate`: mirroring modules with structured configs

`hydrate`: mirroring modules with structured configs

Open rsokl opened this issue 2 years ago • 0 comments

Summary

hydra_zen.hydrate can be applied to a namespace/module, and return a corresponding Hydra node of configs that describe that namespace/module.

In effect, the entire hydra-torch project could be replaced with:

import torch
import hydra_zen

hydra_torch = hydra_zen.hydrate(torch)

(well... not exactly, but that is the rough idea)

Motivation

When creating configs using hydra-zen, the following pattern is quite common:

from hydra_zen import builds

# import some module
import some_module as md

# convert all of its objects that you want to use to config classes
ConfA = builds(md.A, populate_full_signature=True)
ConfB = builds(md.B, populate_full_signature=True)
# etc..

# create instances of configs with particular details
conf_a = ConfA(x=2, y=3)
conf_b = ConfB("hello")
# etc.

Often times steps 2 and 3 are combined: ConfA = builds(md.A, x=2, y=3, populate_full_signature=True). Regardless, it would be nice to be able to minimize this boilerplate code.

Potential solution

There have been discussions about making this process more ergonomic. One idea is to create a hydrate function, that will auto-apply builds (or variations of builds) to all public members of a module.

E.g. consider dummy_module.py:

# contents of dummy_module.py

__all__ = ["func", "A", "B"]

def func(x, y):
    ...

class A:
    ....

class B:
   ...

Then running

import dummy_module

dm = hydra_zen.hydrate(dummy_module)

will create the following structured config:

@dataclass
class dummy_module_configs:
    func: Type[Builds[func]] = builds(func, populate_full_signature=True)
    A: Type[Builds[Type[A]]] = builds(A, populate_full_signature=True)
    B: Type[Builds[Type[B]]] = builds(B, populate_full_signature=True)

dummy_module_configs can then be registered as a node in Hydra's config store, for easy/intuitive access to all of these configs from the overrides API / CLI.

Furthermore, this leads to clean, readable config-creation code. Consider configuring a pipeline of torchvision transformations:

from torchvisions import transforms

viz = hydrate(transforms)  

Cifar10TrainTransformsConf = viz.Compose(
    transforms=[
        viz.RandomCrop(size=32, padding=4),
        viz.RandomHorizontalFlip(),
        viz.ToTensor(),
    ],
)

This code has complete parity with how one would actually create this augmentation pipeline in their ML code.

Some Problems...

Here are some issues with hydrate that I can anticipate

Getting type checkers to understand what the heck is going on

Presently, I do not think that there is any way to tell static tooling that hydrate(module) returns dataclass whose attributes' names match those of module, but whose values are all Type[Builds[...]].

Ultimately, we want users of hydrate(module) to benefit from auto-complete on both attribute names and object signatures. The only way I can think of delivering these things is by lying and annotating hydrate(module) as simply returning module... Thus hydrate(transforms) will look identical to transforms to static tools.

This will produce false positives for users: type-checkers will mark some code patterns as invalid because they do not realize that they are dealing with dataclasses.

This is a big blocker for hydrate. I will need to create discussions on the Python typing mailing list to see if maintainers of the various type checkers have any recommendations here. I do not want hydra-zen users to have static tooling marking a bunch of false positives throughout their code. I am willing to release this as an experimental feature, and only recommend its use in places where static tooling will not mark many false positives.

Not all configs should be produced by `builds(<target>, populate_full_signature=True)`

Although it is a sensible default to apply builds(<target>, populate_full_signature=True) to all objects in a module, this behavior is not always desirable. For instance, optimizers in torch.optim almost always need to be partial-configs, because the model parameters that they will optimize are never available at config/instantiation time. Thus hydrate needs to provide some control over how it creates configs.

As a potential solution, we might design hydrate as follows:

def hydrate(
    module,
    default_config_creation_fn=make_custom_builds_fn(populate_full_signature=True),
    class_specific_config_fns=None,
    excluded_names=None,   # exclude particular items from `__all__`
    target_names=None,     # if provided, takes precedent over `__all__`
):
    ...

where a user could specify:

import torch.optim
from torch.optim import Optimizer

from hydra_zen import hydrate, make_custom_builds_fn

hydrate(torch.optim, 
        class_specific_config_fns={Optimizer: make_custom_builds_fn(populate_full_signature=True, zen_partial=True)}
        )

This would tell hydrate to apply zen_partial=True when creating a config for Optimizer and for all subclasses of Optimizer

Feedback

Does hydrate seem useful? How would you use it? Did I fail to cover specific use cases here? Are there other pitfalls that I am missing?

Apr 10 '22 16:04 rsokl

hydra-zen hydra-zen copied to clipboard

`hydrate`: mirroring modules with structured configs

Summary

Motivation

Potential solution

Some Problems...

Getting type checkers to understand what the heck is going on

Not all configs should be produced by builds(<target>, populate_full_signature=True)

Feedback

hydra-zen
hydra-zen copied to clipboard

Not all configs should be produced by `builds(<target>, populate_full_signature=True)`