opa
opa copied to clipboard
String interpolation
Having string interpolation similar to Python f-strings in Rego would allow for more succint policy, while arguably improving readability by removing the ceremony associated with sprintf
.
Another issue potentially addressed by this would be the common mistake of not handling undefined when referencing input/data in sprintf
arguments. These will halt evaluation as any other undefined reference, but in the likely most common use case for sprintf
— building a "message" or "reason" string to populate a partial rule set, whether a value is undefined or not is likely not interesting enough that you'd want it to affect the actual outcome of the evalutation.
deny[reason] {
not "developer" in input.user.groups
# fails if input.user.name is undefined, and the anonymous user will not be denied
# this is likely not what the policy author intended
reason := sprintf("%v must have role 'developer'", [input.user.name])
}
# less ceremony, and always evaluates if conditions in the body holds
# interpolation step could handle undefined (similar to `print`)
deny["{input.user.name} must have role 'developer'"] {
not "developer" in input.user.groups
}
This would arguably improve string handling not just with regards to sprintf
, but improve things like simple concatenation, which is currently somewhat verbose:
name := concat(", ", [input.first_name, input.last_name])
# vs.
name := "{input.first_name}, {input.last_name}"
Some design decisions would obviously need to be made here, like how to differentiate an "interpolation string" from a regular one, whether (simple?) expressions should be allowed in the interpolated values or only direct references, and perhaps there could be an "interpolation form" where undefined does fail evaluation, etc..
I don't have all the answers, but here's at least a starting point :)
+1 on the benefits of both readability and having a form which tolerates undefined input!
Here are a couple considerations/opinions I'd like to write down which might help us in the design process (Anders probably thought through already).
- Since all built-in functions currently (I think) return undefined if any argument is undefined, it is a good idea to introduce this as a syntax which is not a function. So we can continue to make the sharp statement that all built-in functions behave tis way.
- For backwards compatibility, we would probably want a new syntax that is not the same as a regular string today.
Brainstorming syntaxes:
-
f"{input.user.name} must have role 'developer'"
(Python f-string) Con: A bit incongruous to suddenly have the concept of prefixed literal in Rego? Are there other areas we would like to use prefixed literal in Rego? -
'{input.user.name} must have role \'developer\''
(Use'
-enclosed literal) Con: Deviates from convention in other languages, where'
is either the same as"
or does LESS interpolation than"
. -
`{input.user.name} must have role 'developer'`
(Use`
-enclosed literal) Con: complex expressions or even deep references can make the string hard to read) -
"%s must have role %s" % (input.user.name, role)
(Python % syntax) Con: does not allow the expression in the string itself. -
"{{input.user.name}} must have role {0} or {1}" % (role1, role2)
(Extension of python % syntax) Con: Requires the introduction of a()
-tuple in Rego which may complicate other things. The overloading of the%
operator may make implementation of compiler/fmt/linter/syntax highlighting more complex? -
"{{input.user.name}} must have role {role1} or {role2}" % {"role1": role1, "role2": role2}
(Extension of python % syntax) Con: Does not allow the use of unnamed{}
or{0}
or%v
.
Update: reference to syntax from other languages: https://petamind.com/string-interpolation-in-different-programming-languages/
Also PHP: "${year}-${month}-${day}"
Also Ruby: "#{year}-#{month}-#{day}"
Awesome feedback, @ekcs! I had not given too much thought to syntax before, so your brainstorming was much appreciated. Some spontaneous thoughts on the above:
For backwards compatibility, we would probably want a new syntax that is not the same as a regular string today.
Or we could simply require import future.interpolation
(or whatever we'll call it)? We'd probably not want all strings to be interpolated (i.e none of those provided in the input
by the user), but maybe any string literal / constant defined in policy? If not, I think I'm leaning towards the first option, i.e. a prefix to distinguish these strings from others. I'd definitely prefer if the values are embedded directly in the string. That to me makes it feel more natural that undefined isn't going to halt evaluation, since technically "{input.foo}"
is still just a string, while input.foo
is a direct reference.
Are there other areas we would like to use prefixed literal in Rego?
I could imagine one option to be how undefined should be handled in interpolation, i.e. if it should be printed or halt evaluation.
As for the backticks option, that actually exists in Rego already :) Used for multiline strings and regex.
We'd probably not want all strings to be interpolated (i.e none of those provided in the input by the user), but maybe any string literal / constant defined in policy?
💯 Definitely only string literals.
My preferred way here would be to not introduce new string delimiters, but go with some import future
thing like @anderseknert proposed. But I also doubt that anyone has been using {foo} is {bar}
or ${foo} is ${bar}
in polices right now, without interpolation; so the path forward should not be breaking many users. (Meaning maybe we can even not do this via future import.) WDYT?
Yeah, I agree that it's very unlikely. But maybe better to play it safe? I don't have a strong opinion on that. One concern with having this enabled for all string literals is if this would have any noticeable impact on performance? I doubt the scan is going to be expensive, but maybe there are some policies with a lot of strings defined? For a large JSON string, there's going to be a lot of {
in there, but I guess the interpolation would just stop at the next character anyway, since it's going to be a "
(+/- whitespace). Probably negligible.
🤔 Well this might be a good point for using something that's not valid JSON.
I.e. if we used foo is ${input.foo}
, we'd not have to worry about large JSON strings.
I'm mostly thinking that at some point, I'm confident we'll want to put more than plain refs into that... print("is foo in xs: ${"foo" in xs}")
(oh nested "
... eek...)
Thinking more about this, maybe we should pick a common implementation (JS? Ruby? Python?), and mimic that closely. For one thing, much thought has probably already been put into this; for another thing, we'll be able to say "it just like JS" 🙃
Sounds like a good way forward! And yeah, I can definitely see how people would want to have at least some simple logic included in interpolation. I think resolving refs would be a very good first step though! Thinking more about it, it's probably not very common to have large JSON strings in literals 🤔 The only use case I could think of is tests, but even there there aren't many reasons I can think of to do that instead of providing deserialized objects directly.
Hah! Had not thought of nested "
:D I suppose having to escape those would be OK, or simply assign "foo" to a var outside of the string and reference that.
Backtick-strings could also remain "uninterpolated", so that there's an alternative to use if someone really wants to have {input.foo}
interpreted literally.
Thinking more about it, it's probably not very common to have large JSON strings in literals 🤔
I think JWKS are the exception here. I think some built-ins expect their JWK in string-of-json format 😬
That's true, but I wonder how common it is to have those hard coded in the policy, as opposed to provided either via http.send
or somewhere in data
?
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
Bumping for 1.0 @ashutosh-narkar @johanfylling :)
Most of the concerns raised about this have either been about backwards compatibility (i.e. can't use existing string syntax) or about introducing new syntax. Neither should be a problem for a 1.0 release.
My vote is on using existing string syntax for this, and something like "foo is {foo}"
to reference a var.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.