omegaconf icon indicating copy to clipboard operation
omegaconf copied to clipboard

Improve Tuple support

Open omry opened this issue 3 years ago • 9 comments

Currently tuple is treated as a list.

  1. We should preserve the immutability semantics of Tuple.
  2. When converting to a primitive container, we should get a tuple back, currently we get a list:
In [2]: OmegaConf.to_container(OmegaConf.create((1,2,3)))
Out[2]: [1, 2, 3]

In addition, it's converted to a mutable sequence:

In [6]: isinstance(OmegaConf.create((1,2,3)), MutableSequence)
Out[6]: True

In [7]: isinstance((1,2,3), MutableSequence)
Out[7]: False

Another thing this might enable is type safe tuple support:

from typing import Tuple, List
from omegaconf import OmegaConf
from dataclasses import dataclass

@dataclass
class Foo:
    t1: Tuple[float, int] = (1.0)
    t2: Tuple[int, ...] = (1, 2, 3)

cfg = OmegaConf.structured(Foo)

A cheap alternative to this is to treat tuple as a List and cut some corners with heterogeneous tuples (List[Any]?).

Finally, OmegaConf.to_container() should return a real tuple for input tuples and not a list.

Fixing it probably means the introduction of a TupleConfig class and is potentially a breaking change in some rare cases. This is probably a very large change and I am not convinced it's worth the investment. Please like if you feel this is important.

omry avatar Sep 24 '20 17:09 omry

Probably too big of a change for 2.1 at this point.

omry avatar Feb 03 '21 00:02 omry

Not sure if this is related, but seems likely.

Given a config, containing a custom resolver as_tuple:

vec:
  _target_: sklearn.feature_extraction.text.HashingVectorizer
  ngram_range: ${as_tuple:2,3}
pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - vec
       - ${vec}
    - - clf
      - _target_: sklearn.dummy.DummyClassifier

and an associated script:

import hydra
import hydra.utils as hu

from omegaconf import OmegaConf


def resolve_tuple(*args):
    return tuple(args)


OmegaConf.register_new_resolver("as_tuple", resolve_tuple)


@hydra.main(config_name="conf", config_path="conf"
def main(cfg):
    obj = hu.instantiate(cfg.pipe, _convert_='partial')
    print(obj)

Running this script with the associated config raises the following:

$ python test.py
Error executing job with overrides: []
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    main()
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/main.py", line 52, in decorated_main
    config_name=config_name,
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 378, in _run_hydra

    lambda: hydra.run(
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
    raise ex
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 381, in <lambda>
    overrides=args.overrides,
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 111, in run
    _ = ret.return_value
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "test.py", line 16, in main
    obj = hu.instantiate(cfg.pipe, _convert_='partial')
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 175, in instantiate
    OmegaConf.resolve(config)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 792, in resolve
    omegaconf._impl._resolve(cfg)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 40, in _resolve
    _resolve_container_value(cfg, k)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 25, in _resolve_container_value
    _resolve(node)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 44, in _resolve
    _resolve_container_value(cfg, i)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 25, in _resolve_container_value
    _resolve(node)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 44, in _resolve
    _resolve_container_value(cfg, i)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 19, in _resolve_container_value
    _resolve(resolved)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 40, in _resolve
    _resolve_container_value(cfg, k)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 23, in _resolve_container_value
    node._set_value(resolved._value())
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 44, in _set_value
    self._val = self.validate_and_convert(value)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 57, in validate_and_convert
    return self._validate_and_convert_impl(value)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 134, in _validate_and_convert_impl
    f"Value '{t.__name__}' is not a supported primitive type"
omegaconf.errors.UnsupportedValueType: Value 'tuple' is not a supported primitive type

However, if you modify the python file to only instantiate the vectorizer.

...
def main(cfg):
    hu.instantiate(cfg.vec, _convert_='partial')
    print(obj)
...

The target instantiates successfully.

$ python test.py
HashingVectorizer(ngram_range=(2, 3))

Is this the intended behavior? It seems the allow_objects flag for the config is somehow None when instantiating the vec object in the nested instantiation causing the error, but I am not sure how or what to do to mitigate that occuring.

gphillips-ema avatar Oct 21 '21 19:10 gphillips-ema

Hello again @gphillips-ema :)

Thanks for sharing this example. I believe you're right that there's a bug here. The issue you are having is actually not tuple-specific; the same problem occurs with any other type that is not supported by OmegaConf (i.e. not a dict/list/primitive).

For your use-case, here is a workaround:

vec:
  _target_: sklearn.feature_extraction.text.HashingVectorizer
  ngram_range:
    _target_: builtins.tuple
    _args_:
      - [2, 3]
pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - vec
       - ${vec}
    - - clf
      - _target_: sklearn.dummy.DummyClassifier

I've opened an issue against the Hydra repo here: https://github.com/facebookresearch/hydra/issues/1865.

Jasha10 avatar Oct 22 '21 09:10 Jasha10

@Jasha10 thanks for the workaround. I had been doing the following:

vec:
  _target_: sklearn.feature_extraction.text.HashingVectorizer
  ngram_range:
    - 2
    - 3

This at least instantiates, but I am touch uncomfortable as its a list and not a tuple.

My real motivation for this is to simplify overrides for pipelines that can become complex.

Take this example:

pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - transformer
       - _target_: sklearn.compose.ColumnTransformer
         transformers:
           - - text_pipeline
             - _target_: sklearn.pipeline.Pipeline
               steps:
                 - - vectorizer
                   - _target_: sklearn.feature_extraction.text.HashingVectorizer
                     n_features: ${pow:2,18}
                     ngram_range:
                       - 1
                       - 4
             - Description
    - - classifier
      - _target_: sklearn.dummy.DummyClassifier

In this case a user can only override ngram_range min or max individually like so: pipe.steps.0.1.transformers.0.1.steps.0.1.ngram_range.1=5

This could get rather frustrating. But if you can interpolate the vectorizer as follows:

vectorizer:
  - _target_: sklearn.feature_extraction.text.HashingVectorizer
    n_features: ${pow:2,18}
    ngram_range:
      - 1
      - 4
pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - transformer
       - _target_: sklearn.compose.ColumnTransformer
         transformers:
           - - text_pipeline
             - _target_: sklearn.pipeline.Pipeline
               steps:
                 - - vectorizer
                   - ${vectorizer}
             - Description
    - - classifier
      - _target_: sklearn.dummy.DummyClassifier

The override is now much simpler as the user just needs to add vectorizer.ngram_range.1=5.

Being able to also use a tuple instead of single min or max values is just the icing on the cake here.

This pattern of needing an object to instantiate within a target does seem to pop up a fair amount in our pipelines.

gphillips-ema avatar Oct 22 '21 22:10 gphillips-ema

Any update or workaround on this? I prefer to use tuples instead of lists in my configs for their immutability.

vonHartz avatar Sep 22 '23 13:09 vonHartz

Any update or workaround on this? I prefer to use tuples instead of lists in my configs for their immutability.

Unlikely to be added any time soon unless someone feels like writing a big PR for it.

Potential workarounds:

  • Use some as_tuple resolver as in https://github.com/omry/omegaconf/issues/392#issuecomment-948956968
  • If you run into the above issue related to instantiate, use the above workardound, or write your own custom instantiation wrapper that converts lists into tuples

odelalleau avatar Sep 22 '23 14:09 odelalleau

Thanks for the quick reply @odelalleau .

To clarify, I use structured configs (dataclasses) to pass the configs around for type checking. I temporarily convert to a DictConfig via dict_config = OmegaConf.structured(config) to merge with cli overwrites (via OmegaConf merge) and then convert back to dataclass via OmegaConf.to_container(dict_config, resolve=True, structured_config_mode=SCMode.INSTANTIATE).

With that workflow, the resolver seems not to do anything. Instead, the conversion to a dict_config = OmegaConf.structured(config) converts the tuples to lists.

vonHartz avatar Sep 22 '23 15:09 vonHartz

With that workflow, the resolver seems not to do anything. Instead, the conversion to a dict_config = OmegaConf.structured(config) converts the tuples to lists.

I see -- I'm afraid I'm not entirely sure if there's a way to preserve the tuples in your setup (and no time to look into it closely now).

odelalleau avatar Sep 22 '23 15:09 odelalleau