omegaconf icon indicating copy to clipboard operation
omegaconf copied to clipboard

Consider supporting basic operators in interpolations of numbers

Open omry opened this issue 4 years ago • 28 comments

a: 1
b: 2.0
d: ${a} - ${b}
e: ${a} + ${b}
d: ${a} * ${b}
e: ${a} / ${b}

desired behavior:

conf = OmegaConf.load("file.yaml")
assert conf.d == -1.0
assert conf.d == 3.0
assert conf.d == 2.0
assert conf.d == 0.5

This should work only if both operands are numbers (int or float). result type should be int if both are ints or float otherwise.

EDIT (2022-12-20):

✨✨✨ See the OmegaConf docs on How to Perform Arithmetic Using eval as a Resolver ✨✨✨

omry avatar Dec 02 '19 16:12 omry

+1. Out of curiosity, why not use the default behavior of pythons operands for any types? I.e. don't check the type (let python return its errors if need be) and just call + , - , / , * . For example do string concatenation by default with +.

YannDubs avatar Dec 24 '19 10:12 YannDubs

String concatenation is already supported directly:

url: http://${host}:${port}:${path}

It might still be worth supporting it, but some other things does not make immediate sense, for example:

a:
  bb: 10
  cb: 20

b:
  xx: 30
  yy: 40

c: foo

d1: ${a} + ${b}
d2: ${a} + ${c}

While it's plausible to come with interpretations for both d1 and maybe even d2, I suspect the surface area from supporting those interpretation is very large. I would like to keep operator support well defined and limited to the most impactful use cases.

omry avatar Dec 24 '19 10:12 omry

Ok, I would personally just use python way of doing by returning

def interpolate_binary(a,b,op):
    return op(OmegaConf.to_container(a, resolve=True),
              OmegaConf.to_container(b, resolve=True))

In which case both d1 and d2 would raise a TypeError. That seems like the most intuitive and easiest way of maintaining things. But maybe it's naive, I didn't actually look through OmegaConf codebase much ...

YannDubs avatar Dec 24 '19 10:12 YannDubs

Supporting + for arbitrary values may be incompatible with existing strings:

foo: FOO
bar: BAR
key: ${foo}+${bar}

current interpretation is FOO+BAR, new interpretation will be FOOBAR. limiting to numerical types reduces the incompatibility impact.

There should be a way to prevent operators from executing for cases like this.

In any case, when I try to actually implement it I will think about it more seriously.

omry avatar Dec 24 '19 11:12 omry

@odelalleau, I was thinking about introducing an eval resolver later to do arithmetic operations. It could also be using a different syntax, like #{expression}.

two : #{1 * 2}   
five : #{1 + ${two} * 2} 

# or using a regular resolver:
eight : ${eval:4 + ${two} * 2} 

I kinda like #{} though (although I just realized it's a yaml comment so it will have unfortunate side effects. some other symbol? % or ^ ?).

omry avatar Feb 06 '21 00:02 omry

I was thinking about introducing an eval resolver later to do arithmetic operations.

Personally I would find it more intuitive to do it the way you initially wrote at the top of this issue, rather than introduce a new resolver or syntax. Did you have any major concern with your initial proposition? (I think it should remain limited to floats & integers, which should cover most common use cases, and people can still write custom resolvers for more advanced computations).

odelalleau avatar Feb 06 '21 14:02 odelalleau

The main concern is backward compatibility. Interpreting all configs as expressions means that you can't have a config value that is an expression:

x: 10 * 2

Quoting it will not help because at YAML level it's already a string. An interpolation/syntax allow you to differentiate the two cases.

# "10 * 2"
x: 10 * 2

# 20
y: ${eval:10 * 2}

It is possible to only process expressions if they contain interpolations, but feels like a hack. This is useful expression that will not be handled:

# 10MB
y: 10 * 1024 * 1024

I guess we could go with the new decode resolver to support the string form:

y: ${oc.decode:"10 * 1024 * 1024"}  # should be 10 * 1024 * 1024

omry avatar Feb 07 '21 08:02 omry

Follow-up thoughts on this (tl;dr: I think it can be implemented at the level of resolver arguments, but this would make it a bit more cumbersome to use arithmetic operators in strings especially on Hydra's CLI, so maybe a unique syntax like #{..} is better)

  • I agree this shouldn't happen in regular strings, for the backward compatibility reason you mentioned. And even if it was ok to break backward compatibility, it would probably make things too complex to be worth it (ex: is rating: 5* an incorrect arithmetic expression, or a string?)
  • I think it could be made to work within interpolations, using the same grammar that parses resolver arguments. The drawback is that it would still break compatibility, but it would be less widespread: we would need to forbid at least the /-+* characters that are currently allowed in unquoted strings, so for instance one should replace ${foo:a-b+c} with ${foo:"a-b+c"} to keep the same behavior as before.
  • I don't think the new oc.decode resolver is needed here. The way I've implemented this resolver is by evaluating its input string as if it was a resolver argument. If we actually add this feature to regular resolver arguments, then ${oc.decode:"10 * 1024 * 1024"} simply becomes equivalent to ${identity: 10 * 1024 * 1024}, where identity is a resolver that just returns its input.
  • So, if we do it this way one suggestion to keep the syntax short would be to use an empty resolver: ${:10 * 1024 * 1024}. It could also be a different syntax for convenience, like the #{..} you suggested. Also, I'm not sure what would be the timeline for this, but personally I won't have time to implement it in 2.1. What I can easily do, however, is trigger a deprecation warning for people who are using arithmetic operators in unquoted strings (/-+*, and also probably %), to tell them to use quoted strings instead. This will make it easier to introduce these operations in the grammar later.
  • The backward compatibility would still be a potential issue though, especially for Hydra command-line arguments, that are parsed using the same grammar as resolver arguments. So for instance language=c++ would need to be quoted, which is always ugly in the shell: language="'c++'". So maybe we really want a separate parsing logic, which would motivate a new syntax like #{..}.

odelalleau avatar Mar 02 '21 18:03 odelalleau

Makes sense. I don't think we need to deprecate anything at this time (since we are not sure about the final design yet). We can deprecate things in between 2.1 and 2.1 without a major version (e.g. 2.1.3). Also - this is also absolutely not getting into 2.1, so no worries there.

omry avatar Mar 02 '21 18:03 omry

A few thoughts here

  1. What about supporting inequality operators in addition to arithmetic?
  2. I like the ${eval:} or #{expression} syntax mentioned a lot. That seems to be very flexible, and by using the former I think we get nested interpolations for free, right?
  3. Could ${eval:} make use of something like ast.literal_eval() to support more complex expressions? Likely this should be modified to not allow any arbitrary expression since that'd be a security risk. See below for desired example
from dataclasses import dataclass
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf


@dataclass
class Conf:
    foo: str = 'bar'
    baz: int = "${eval: 1 if ${foo} == 'bar' else 0}"  # Magic, yet-to-be-implemented resolver


cs = ConfigStore.instance()
cs.store('config', Conf)


@hydra.main(config_name='config', config_path=None)
def main(cfg):
    resolved_cfg = OmegaConf.to_container(cfg, resolve=True, enum_to_str=True)
    print(OmegaConf.to_yaml(resolved_cfg))
    
    
if __name__ == '__main__':
    
    main()

It'd be cool to have this output

python app.py
foo: bar
baz: 1

python app.py foo=other
foo: other
baz: 0

Currently to get the equivalent behavior, you need to introduce some auxiliary resolvers

def resolve_if(condition, true_value, false_value):
    return true_value if condition else false_value


def resolve_eq(first, second):
    return first == second


@dataclass
class Conf:
    foo: str = 'bar'
    baz: int = "${if: ${eq: ${foo}, 'bar'}, 1, 0}"


OmegaConf.register_new_resolver('if', resolve_if)
OmegaConf.register_new_resolver('eq', resolve_eq)

Personally I find the eval interpolation preferable since the desired logic is immediately clear to anyone who understands Python. In comparison, the workaround would require the user to check the source code registered to each custom resolver in order to clarify what the expected result would be. Lastly, the eval option doesn't require nested curly-braces

${if: ${eq: ${foo}, 'bar'}, 1, 0}
${eval: 1 if ${foo} == 'bar' else 0}

addisonklinke avatar Apr 20 '22 17:04 addisonklinke

and by using the former I think we get nested interpolations for free, right?

Hopefully xD

Jasha10 avatar Apr 22 '22 18:04 Jasha10

I'd like to remark that using python's builtins.eval already gets much of the job done:

from typing import Any
from omegaconf import OmegaConf

OmegaConf.register_new_resolver("eval", eval)

cfg = OmegaConf.create('a: "Computed ${eval: 2 * 3 * 5}"')
print(cfg.a)  # Prints 'Computed 30'

Using nested interpolations is a little bit tricky, but careful quoting makes it possible:

cfg = OmegaConf.create(
"""
foo: bar
baz: ${eval:'1 if "${foo}" == "bar" else 0'}
"""
)
print(cfg.baz)  # Prints '1'

The interpolation "${foo}" needs to be quoted so that eval will interpret its value as a string "bar" and not as a variable bar (which would cause a NameError).

Jasha10 avatar Apr 22 '22 18:04 Jasha10

@Jasha10 That's really cool! Ideally I would want there to be a syntax that works in all cases, but the spaces and quotes really make things quite tricky.

After importing/registering, I ran the following tests:

# Test A: Quote the entire RHS, with a space after `eval:`
def test_1a_eval_outer_with_space():
    cfg = OmegaConf.create("""
    a: '${eval: 2 * 3 * 5}'
    """)
    assert cfg.a == 30 # => pass!

def test_2a_nested_outer_with_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: '${eval: 1 if "${foo}" == "bar" else 0}'
    """) #        => GrammarParseError: token recognition error at: '='
    assert cfg.baz == 1

# --------------------------------------------------------------------
# Test B: Quote the insides only, with a space after `eval:`
def test_1b_eval_inner_with_space():
    cfg = OmegaConf.create("""
    a: ${eval: '2 * 3 * 5'}
    """) #   ^    => ScannerError: mapping values are not allowed here
    assert cfg.a == 30

def test_2b_nested_inner_with_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: ${eval: '1 if "${foo}" == "bar" else 0'}
    """) #     ^  => ScannerError: mapping values are not allowed here
    assert cfg.baz == 1

# --------------------------------------------------------------------
# Test C: Quote entire RHS, with no space after `eval:`
def test_1c_eval_outer_no_space():
    cfg = OmegaConf.create("""
    a: '${eval:2 * 3 * 5}'
    """)
    assert cfg.a == 30 # => pass!

def test_2c_nested_outer_no_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: '${eval:1 if "${foo}" == "bar" else 0}'
    """) #        => GrammarParseError: mismatched input '"' expecting BRACE_CLOSE
    assert cfg.baz == 1

# --------------------------------------------------------------------
# Test D: Quote the insides only, with no space after `eval:`
def test_1d_eval_inner_no_space():
    cfg = OmegaConf.create("""
    a: ${eval:'2 * 3 * 5'}
    """)
    assert cfg.a == 30 # => pass!

def test_2d_nested_inner_no_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: ${eval:'1 if "${foo}" == "bar" else 0'}
    """)
    assert cfg.baz == 1 # => pass!
# --------------------------------------------------------------------

The best option seems to be method D, immediately following eval: with a quoted string, without putting a space in between.

But I don't have a clue as to why...

camall3n avatar Sep 18 '22 00:09 camall3n

The best option seems to be method D, immediately following eval: with a quoted string, without putting a space in between.

But I don't have a clue as to why...

If you don't quote the string then you are restricted to only characters allowed in unquoted strings, so things will break as soon as you use another character in your expression. For instance, quotes are not allowed, which explains some of your errors.

Putting a space after eval actually works at the OmegaConf level, but it's failing for you because you're using YAML syntax and YAML doesn't like it.

Here are two ways to fix it (besides removing the space):

# Dropping YAML
def test_2b_nested_inner_with_space():
    cfg = OmegaConf.create(
        {
            "foo": "bar",
            "baz": """${eval: '1 if "${foo}" == "bar" else 0'}"""
        }
    )
    assert cfg.baz == 1
# Quoting the YAML string (which requires escaping the inner quotes for the YAML parser)
# (note that in a .yaml file you would use a single \, here it is doubled because it is in a Python string)
def test_2b_nested_inner_with_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: "${eval: '1 if \\"${foo}\\" == \\"bar\\" else 0'}"
    """)
    assert cfg.baz == 1

odelalleau avatar Sep 18 '22 01:09 odelalleau

Thanks, that's helpful. That also explains why the ScannerErrors are YAML errors, but the GrammarParseErrors are OmegaConf errors.

camall3n avatar Sep 18 '22 16:09 camall3n

But I have not understood how in this syntax how you can write things like:

a:
  aa: 64

b:
  bb: ${eval: '2 * ${a.aa}'}

The above should work but it doesn't in my case :(

ysig avatar Dec 19 '22 17:12 ysig

a:
  aa: 64

b:
  bb: ${eval: '2 * ${a.aa}'}

The above should work but it doesn't in my case :(

It's probably just a YAML parsing error, you can fix it by removing the space after eval: or enclosing bb's definition within double quotes.

odelalleau avatar Dec 19 '22 17:12 odelalleau

I did this and now I get this: omegaconf.errors.UnsupportedInterpolationType: Unsupported interpolation type eval

ysig avatar Dec 20 '22 11:12 ysig

Hi @ysig, see the OmegaConf docs on How to Perform Arithmetic Using eval as a Resolver.

Jasha10 avatar Dec 20 '22 12:12 Jasha10

I have seen them but I don't see where my error is. After all I just use multiplication.

On Tue, Dec 20, 2022, 14:15 Jasha Sommer-Simpson @.***> wrote:

Hi @ysig https://github.com/ysig, see the OmegaConf docs on How to Perform Arithmetic Using eval as a Resolver https://omegaconf.readthedocs.io/en/2.3_branch/how_to_guides.html#id1

— Reply to this email directly, view it on GitHub https://github.com/omry/omegaconf/issues/91#issuecomment-1359274045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGY7H2OEOBIXVHNUU2UP25LWOGPPJANCNFSM4JTY26UA . You are receiving this because you were mentioned.Message ID: @.***>

ysig avatar Dec 20 '22 17:12 ysig

I have seen them but I don't see where my error is. After all I just use multiplication.

You are most likely forgetting to register the eval resolver:

OmegaConf.register_new_resolver("eval", eval)

odelalleau avatar Dec 20 '22 18:12 odelalleau

Am I the only one concerned about the fact that registering eval as a resolver opens a can of worms from a security perspective?

It opens the door to arbitrary code execution via configuration, so I find it hard to believe that it is the suggested official workaround! Moreover, such a suggestion is IMHO also inconsistent with other Hydra design choices. Take for instance the hydra.utils.instantiate API: its purpose is evidently to allow to instantiate classes and call functions from configuration without allowing arbitrary code execution. If an eval resolver was indeed considered a "valid choice" by Hydra documentation, then why should the hydra.utils.instantiate API even exist, given that its use case is already covered by this resolver? The same applies to other more specific resolvers as well: why would they be needed if an eval resolver exists? These questions are just to illustrate that suggesting an eval resolver in the official docs is inconsistent with the rest of the API (which takes safe design choices).

Please don't get me wrong: I am a happy user of Hydra and I am thankful to all the contributors that work on this framework, I use the hydra.utils.instantiate API and several specific resolvers, and I will welcome the feature described in this ticket, but I will definitely NOT include an eval resolver in my application: it's a security risk and there are several better workarounds. Please consider removing the eval suggestion from the docs.

teocasse avatar Feb 12 '23 21:02 teocasse

Please consider removing the eval suggestion from the docs.

I agree it'd be good to add a warning, but IMO it's a useful tip that's worth keeping in the docs (there are many situations where people don't care about such a security risk -- I know I'm a happy user of the eval resolver and I'm well aware that it allows arbitrary code execution through the config).

Regarding your comment on eval making instantiate useless, note that (i) eval is not available by default (and shouldn't be, for the exact reason you mention here), and (ii) it's not as convenient to use as the instantiate API to create complex objects.

odelalleau avatar Feb 13 '23 01:02 odelalleau

Opinions can differ of course, but IMO just adding a warning because it can be a useful tip is not enough given that we are talking about the official docs. For one person like you that is using it who is perfectly conscious of the risks, I bet there is another (and probably more) who will use it without thinking a moment about the risks because "it's in the docs". If I didn't know better, I would happily include it because it's so-damn-practical!

I don't consider eval an absolute bad practice: there are some scenarios where its use can be acceptable. But this is exactly THE scenario where using eval is a no-go unless you really, really know what you are doing and you are in complete control.

We are talking about a one-liner that makes the entire application unsecure, just its presence means that any parameter that is configurable by the end-user can be abused for code execution. Just imagine a multi-dev scenario where a developer added this resolver, and another one unaware of it exposes an innocuous parameter on the front-end...

Since Hydra is a configuration framework, I really think this is where the docs should draw a line in the sand and absolutely discourage this. IMO, putting it in the docs means there is some blessing from the framework on using this pattern, even with a warning... Unless you make the warning really huge, but then why did you even include it in the docs?

IMO such a suggestion does not belong to official docs, rather to a stackoverflow thread so that it can get up/downvoted and commented, with other safer alternatives discussed.

The choice is down to Hydra developers of course, but personally, I think that a better solution would be either

  • suggest the usage of specific resolvers for add, subtract etc (with a couple examples, it's going to be a couple of lines); or
  • simply say that resolvers for arithmetic operations are not supported yet, and point to this feature request.
Regarding your comment on eval making instantiate useless

My point was rather that this suggestion in the docs is inconsistent with the presence of other safer alternatives, not that instantiate becomes absolutely useless. Anyway, the main point is the security aspect.

teocasse avatar Feb 13 '23 07:02 teocasse

@teocasse what if there existed a oc.eval_num resolver that would call eval() under the hood, but would refuse to do so if there is any aphabetic character within the string? ([a-zA-Z_]) Off the top of my head I don't think you can abuse eval() without such a character (it would unfortunately prevent numbers like 10_000 unless we make the parsing smarter, but that's a small price to pay for safety).

odelalleau avatar Feb 14 '23 00:02 odelalleau

I cannot comment whether the specific implementation you suggest would be secure, but on a more general level I would not be against a "safer" eval implementation. With eval the common recommendation is to make sure that your input is sanitized before passing it, and your suggestion goes exactly in that direction. What was concerning me wasn't the usage of eval per-se, rather its "naïve" usage in the suggested resolver.

This post suggests a few alternatives to "plain eval" in order to parse mathematical expressions in a secure way: https://stackoverflow.com/questions/2371436/evaluating-a-mathematical-expression-in-a-string

teocasse avatar Feb 14 '23 06:02 teocasse