omegaconf
omegaconf copied to clipboard
Improve Tuple support
Currently tuple is treated as a list.
- We should preserve the immutability semantics of Tuple.
- When converting to a primitive container, we should get a tuple back, currently we get a list:
In [2]: OmegaConf.to_container(OmegaConf.create((1,2,3)))
Out[2]: [1, 2, 3]
In addition, it's converted to a mutable sequence:
In [6]: isinstance(OmegaConf.create((1,2,3)), MutableSequence)
Out[6]: True
In [7]: isinstance((1,2,3), MutableSequence)
Out[7]: False
Another thing this might enable is type safe tuple support:
from typing import Tuple, List
from omegaconf import OmegaConf
from dataclasses import dataclass
@dataclass
class Foo:
t1: Tuple[float, int] = (1.0)
t2: Tuple[int, ...] = (1, 2, 3)
cfg = OmegaConf.structured(Foo)
A cheap alternative to this is to treat tuple as a List and cut some corners with heterogeneous tuples (List[Any]?).
Finally, OmegaConf.to_container() should return a real tuple for input tuples and not a list.
Fixing it probably means the introduction of a TupleConfig class and is potentially a breaking change in some rare cases. This is probably a very large change and I am not convinced it's worth the investment. Please like if you feel this is important.
Probably too big of a change for 2.1 at this point.
Not sure if this is related, but seems likely.
Given a config, containing a custom resolver as_tuple
:
vec:
_target_: sklearn.feature_extraction.text.HashingVectorizer
ngram_range: ${as_tuple:2,3}
pipe:
_target_: sklearn.pipeline.Pipeline
steps:
- - vec
- ${vec}
- - clf
- _target_: sklearn.dummy.DummyClassifier
and an associated script:
import hydra
import hydra.utils as hu
from omegaconf import OmegaConf
def resolve_tuple(*args):
return tuple(args)
OmegaConf.register_new_resolver("as_tuple", resolve_tuple)
@hydra.main(config_name="conf", config_path="conf"
def main(cfg):
obj = hu.instantiate(cfg.pipe, _convert_='partial')
print(obj)
Running this script with the associated config raises the following:
$ python test.py
Error executing job with overrides: []
Traceback (most recent call last):
File "test.py", line 21, in <module>
main()
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/main.py", line 52, in decorated_main
config_name=config_name,
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 378, in _run_hydra
lambda: hydra.run(
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 381, in <lambda>
overrides=args.overrides,
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "test.py", line 16, in main
obj = hu.instantiate(cfg.pipe, _convert_='partial')
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 175, in instantiate
OmegaConf.resolve(config)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 792, in resolve
omegaconf._impl._resolve(cfg)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 40, in _resolve
_resolve_container_value(cfg, k)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 25, in _resolve_container_value
_resolve(node)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 44, in _resolve
_resolve_container_value(cfg, i)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 25, in _resolve_container_value
_resolve(node)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 44, in _resolve
_resolve_container_value(cfg, i)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 19, in _resolve_container_value
_resolve(resolved)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 40, in _resolve
_resolve_container_value(cfg, k)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 23, in _resolve_container_value
node._set_value(resolved._value())
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 44, in _set_value
self._val = self.validate_and_convert(value)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 57, in validate_and_convert
return self._validate_and_convert_impl(value)
File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 134, in _validate_and_convert_impl
f"Value '{t.__name__}' is not a supported primitive type"
omegaconf.errors.UnsupportedValueType: Value 'tuple' is not a supported primitive type
However, if you modify the python file to only instantiate the vectorizer.
...
def main(cfg):
hu.instantiate(cfg.vec, _convert_='partial')
print(obj)
...
The target instantiates successfully.
$ python test.py
HashingVectorizer(ngram_range=(2, 3))
Is this the intended behavior? It seems the allow_objects
flag for the config is somehow None
when instantiating the vec
object in the nested instantiation causing the error, but I am not sure how or what to do to mitigate that occuring.
Hello again @gphillips-ema :)
Thanks for sharing this example. I believe you're right that there's a bug here. The issue you are having is actually not tuple-specific; the same problem occurs with any other type that is not supported by OmegaConf (i.e. not a dict/list/primitive).
For your use-case, here is a workaround:
vec:
_target_: sklearn.feature_extraction.text.HashingVectorizer
ngram_range:
_target_: builtins.tuple
_args_:
- [2, 3]
pipe:
_target_: sklearn.pipeline.Pipeline
steps:
- - vec
- ${vec}
- - clf
- _target_: sklearn.dummy.DummyClassifier
I've opened an issue against the Hydra repo here: https://github.com/facebookresearch/hydra/issues/1865.
@Jasha10 thanks for the workaround. I had been doing the following:
vec:
_target_: sklearn.feature_extraction.text.HashingVectorizer
ngram_range:
- 2
- 3
This at least instantiates, but I am touch uncomfortable as its a list and not a tuple.
My real motivation for this is to simplify overrides for pipelines that can become complex.
Take this example:
pipe:
_target_: sklearn.pipeline.Pipeline
steps:
- - transformer
- _target_: sklearn.compose.ColumnTransformer
transformers:
- - text_pipeline
- _target_: sklearn.pipeline.Pipeline
steps:
- - vectorizer
- _target_: sklearn.feature_extraction.text.HashingVectorizer
n_features: ${pow:2,18}
ngram_range:
- 1
- 4
- Description
- - classifier
- _target_: sklearn.dummy.DummyClassifier
In this case a user can only override ngram_range min or max individually like so: pipe.steps.0.1.transformers.0.1.steps.0.1.ngram_range.1=5
This could get rather frustrating. But if you can interpolate the vectorizer as follows:
vectorizer:
- _target_: sklearn.feature_extraction.text.HashingVectorizer
n_features: ${pow:2,18}
ngram_range:
- 1
- 4
pipe:
_target_: sklearn.pipeline.Pipeline
steps:
- - transformer
- _target_: sklearn.compose.ColumnTransformer
transformers:
- - text_pipeline
- _target_: sklearn.pipeline.Pipeline
steps:
- - vectorizer
- ${vectorizer}
- Description
- - classifier
- _target_: sklearn.dummy.DummyClassifier
The override is now much simpler as the user just needs to add vectorizer.ngram_range.1=5
.
Being able to also use a tuple instead of single min or max values is just the icing on the cake here.
This pattern of needing an object to instantiate within a target does seem to pop up a fair amount in our pipelines.
Any update or workaround on this? I prefer to use tuples instead of lists in my configs for their immutability.
Any update or workaround on this? I prefer to use tuples instead of lists in my configs for their immutability.
Unlikely to be added any time soon unless someone feels like writing a big PR for it.
Potential workarounds:
- Use some
as_tuple
resolver as in https://github.com/omry/omegaconf/issues/392#issuecomment-948956968 - If you run into the above issue related to instantiate, use the above workardound, or write your own custom instantiation wrapper that converts lists into tuples
Thanks for the quick reply @odelalleau .
To clarify, I use structured configs (dataclasses) to pass the configs around for type checking.
I temporarily convert to a DictConfig via dict_config = OmegaConf.structured(config)
to merge with cli overwrites (via OmegaConf merge) and then convert back to dataclass via OmegaConf.to_container(dict_config, resolve=True, structured_config_mode=SCMode.INSTANTIATE)
.
With that workflow, the resolver seems not to do anything. Instead, the conversion to a dict_config = OmegaConf.structured(config)
converts the tuples to lists.
With that workflow, the resolver seems not to do anything. Instead, the conversion to a
dict_config = OmegaConf.structured(config)
converts the tuples to lists.
I see -- I'm afraid I'm not entirely sure if there's a way to preserve the tuples in your setup (and no time to look into it closely now).