Proposal: core builtin extensions
Originally opened by @mpvl in https://github.com/cuelang/cue/issues/943
Definitions
Before we introduce some of the proposed builtins, we formally introduce some as-of-yet undocumented language features.
Functions
We propose cue supports named argument functions and calls to “structs” as a shorthand for the common macro pattern (e.g. (s & { _, a: x}).out).
A function argument is now defined as:
Argument = [ identifier ":" ] Expression .
Any named argument must be followed by other named arguments.
The expression s(a: x, b: y), where s is a struct, is now a shorthand for s & {_, a: x, b: y).
Validator
A validator is a special builtin that is evaluated by unifying it with other values whereby the result is one of a few outcomes:
- pass: returns
_if the validation is successful and making the value with which it was unified more specific does not change this result (or it is a final evaluation). - incomplete error: the validation failed, but making the value with which it was unified more specific could change this result
- fatal error: the validation failed and making the value more specific cannot change this result.
A validator must be run at the last stage of evaluating a node, after a fixed point is reached evaluating all all non-validator values, in which case any error is considered a fatal error. A validator may be run at earlier stages of the evaluation of a node, in which case an incomplete error signifies that the decision on validity must be postponed.
An example of a language-level validator is <10. struct.MinFields and struct.MaxFields are examples of validators of builtin packages.
Validators can be thought of as a Go function that has an error return signature.
Inferred validators
Optional: Builtin functions that have the signature foo(x1, x2, …, xn) bool may be implicitly interpreted as validators of the signature foo(x2, …, xn) error.
The CUE function notation
We define the following signature format for cue functions:
FunctionDecl = identifier Arguments "::" Expression .
Arguments = "(" [ Argument { "," Argument } [ "," ] ] ")" .
Argument = [ identifier ":" ] Expression .
Either all or none of the arguments should be named.
The following rules apply for calling functions with this signature:
- An argument with a default value in its expression may be omitted in a call. All other arguments must be present in a call.
- A call must either have all named, or all unnamed arguments. This could be rel
These rules could be relaxed later.
Proposed builtins
builtins to replace _|_ (bottom)
Although _|_ is part of the standard CUE idiom, it has several issues:
- no ability to associate user-defined message to bottom
- meaning of comparison against bottom is unclear
- the symbol looks offensive to some
We intend to deprecate the bottom symbol (keeping it around for backwards compatibility) and replace it with builtins that clearer conveys the intent of its usage.
Comparison is not supported by the spec (arguably), but it is a crucial piece of functionality for many CUE configurations. The meaning of it is unclear, however. In many cases, it is used to check whether a reference exists. In some cases, however, the intended meaning is to check that a value is valid. In reality, CUE implements a semantic that is somewhere in between the two cases: it checks the validity of a value, but not recursively.
Note that if any of these builtins return false, they may still be satisfied at a later point in time. Evaluation should take this into account, as usual.
_|_ replacement: error(msg: string | *null) :: _|_
The use of error(msg) replaces the common use of _|_ with the added ability to associate a user message with an error. When used within a disjunction, the error will get eliminated as usual, but upon failure of the disjunction, the user-supplied error is used as an alternative error message.
Comparison to bottom
Uses of comparison against bottom will need to be replaced with one of the following builtins.
isconcrete(expr) :: bool
isconcrete reports whether expr resolves to a concrete value, returning true if it does and false otherwise. It is a fatal error if an expression can never evaluate to true.
Example:
a: {}
b: int
c: isconcrete(a) // true
d: isconcrete(b) // false
e: isconcrete(a.b) // false(b could still be defined)
f: isconcrete(b.c) // fatal error (b.c can never be satisfied)
Purpose: replaces if a.foo != _|_ {, where it is checked whether a.foo exists with the purpose of determining whether it is a concrete value.
exists(expr) :: bool (optional)
exists reports whether expr resolves to any value.
Example:
a: {}
b: int
c: exists(a) // true
d: exists(b) // true
e: exists(a.b) // false (b could still be defined)
f: exists(b.c) // fatal error (b.c can never be satisfied)
opt?: int
ref: exists(opt) // false considered to be non-existing.
req!: int
ref: exists(req) // false
Purpose: replaces if a.foo != _|_ {, where it is checked whether a.foo exists regardless of concreteness.
validator builtins
must(expr: _, msg: string | *null) :: _
must(expr) passes if expr evaluates to true and fails otherwise.
Must can be used to turn arbitrary expressions into constraints. For instance, a: <10 can be written as a: must(a < 10). See Issue #575 for details
not(expr) :: _
not(expr) passes if unified with a value x for which expr&x fails and false otherwise.
See #571 for details.
Examples:
a: not(string) // number | bytes | {...} | [...] | bool | null
numexist(count, ...expr) :: _
numexist(count, ...expr) passes if the number of expressions for which exists(x) evaluates to true unifies with count.
The main purpose of numexist is to indicate mutual exclusivity of fields.
#X: {
// either foo or bar may be specified by the user
numexist(<=1, foo, bar)
foo?: int
bar?: int
}
numconcrete(count, ...expr) :: _ (optional)
numconcrete(count, ...expr) passes if the number of expressions for which isconcrete(x) evaluates to true unifies with count.
numvalid(count, ...expr) :: _ (optional)
numvalid(count, ...expr) passes if the number of expressions for which isvalid(x) evaluates to true unifies with count.
Builtins related to concrete values
Purpose: combine schema of different instances of the same package that would otherwise fail because there are conflicting definitions.
manifest(x) :: _
manifest evaluates x stripping it of any optional fields and definitions and disambiguating disjunctions after their removal.
Use cases:
- combine instances that only differ in templates.
Defining ranges
Looking around at other languages, defining range numbers clearly is a hard problem, as it is often not clear from just looking at the syntax, or even wording, whether or not ranges are inclusive.
CUE’s unary comparators provide a possible solution to this issue.
range(from: int, to: int, by: int | *1) :: [...int]
Builtin range returns a stream of values, starting from from (must be concrete) , adding by (defaults to 1) as long as unification with to succeeds. It is an error to define a range that never terminates.
Examples:
range(from: 1, to: <10) // [1, ..., 9]
range(from: 1, to: >=0.5, by: -0.1) // [1, 0.9, ..., 0.5]
range(from: 1, to: <1) // []
range(from: 1, to: >=1) // error("infinite range")
Switching
CUE’s if is not paired with an else. This is partly because if really is a comprehension. But another reason is that the use of else quickly leads to nested conditions. A switch statement is generally more conducive to readability in this case.
A switch statement can be simulated in CUE using lists:
choice: [
if a { x },
if b { y },
z,
][0]
is equivalent to the hypothetical
choice: if a { x } else { if b { y } else { z } }
The issue is that the hidden [0] at the end of the switch is impairing readability.
head
A head builtin could make the above more readable. It would do nothing more than select the first element in a list, but doing so by more clearly signaling the intention at the start of the list.
choice: head([
if a { x },
if b { y },
z, // default
])
Package std
We’re considering making all core builtins available under the package std, so that they can be referenced unambiguously and more clearly than using the __ prefix.
import “std”
a: std.range(from: 3, by: -1, to: >0) // 2, 3, 1
Original reply by @seh in https://github.com/cuelang/cue/issues/943#issuecomment-832200043
This is so good to see.
One problem to consider with the "Switch" section: You write, more or less, if a {} else if b {} ..., but quite frequently b is !a or not a, which requires restating a. Could let help here to define the result of a once, and express it being both true for the consequent branch and its negation for the alternate branch?
Original reply by @seh in https://github.com/cuelang/cue/issues/943#issuecomment-832203882
Also, while head is evocative, it does so little that it barely justifies its inclusion. I thought of coalesce as a good name for picking the first suitable item in a sequence that can accommodate "null" or disqualified values. Against that, though, in your "Switch" example, I suppose the list should never wind up with more than one value, as opposed to it being prefixed by any number of "null" values.
Original reply by @mpvl in https://github.com/cuelang/cue/issues/943#issuecomment-832213293
@seh: yes, let could be used here that way, though outside the list. We could perhaps consider allowing let in lists.
Also, one could mimic this behavior with: head([if a {}, {}]), where the second element is the "default", and thus !a`.
Regarding head: I agree its utility is a bit meager. We did consider a select builtin which I think is close to what you're proposing, where it would pick the first of any valid entry. The main problem with this pattern seems that it will be too easy to ignore potential errors, so it may be a less safe approach. Having said that, it reads quite nice and we have seen configurations where this would have merit. So it is something to consider. It just seemed safer to see how far one would get with this seemingly safer approach.
I'm not sure I understand the point with the null values, but maybe this answers your question.
Do you think adding head is not warranted and using a [...][0] pattern is sufficient?
Original reply by @seh in https://github.com/cuelang/cue/issues/943#issuecomment-832217894
I was not sure that CUE has the same notion of "null" values that SQL, HCL, Jsonnet, and other languages have, so the semantics of a hypothetical coalesce function might not apply.
I don't think head is warranted without tail (or rest), and perhaps nth. My Lisp is showing. I haven't yet reached for any functions like that, though. I'd rather spend those tokens on set manipulation functions for lists.
Would it be possible to write a CUE "function" that encapsulates your [if a {consequent}, {alternate}][0] technique? It would require at least two inputs; the alternate could be optional. It's not much compression, but might cut down on the "syntactic noise" with those brackets. Yes, I confess that I'm still looking for else.
Original reply by @mpvl in https://github.com/cuelang/cue/issues/943#issuecomment-835751945
@seh: you can do else with the switch approach and I’m not in favor of a dedicated If-else construct, as it encourages bad patterns.
But I see your points otherwise. I guess you could indeed express this as cue macros neatly if we had the call shorthand. head would then be defined as:
head: { #0[0], #0: […] }
One problem is that the first element cannot have a conflicting definition of #0.
But maybe this is enough for now to just point out the pattern and suggest that people comment the construct:
aSwitch: [ // select first match
if a { … },
if b { … },
c // default
][0]
anIfElse: [ // if then else
if a { … },
c // else
][0]
This would not require any additions to the language and we can get some experience to see what works. The query addition may also provide useful patterns that obviates the need for this.
Original reply by @mpvl in https://github.com/cuelang/cue/issues/943#issuecomment-835762509
@seh in CUE, bottom (incomplete errors,
to be more specific) is a bit like null in those languages. null can mean various things, often not compatible with the notion of null here. So it seemed impossible to assign any specific meaning to it.
Noting that one use case of comparison against _|_ we should explicitly document (I'm not totally clear it is actually covered above) is that of type assertion, as discussed in https://github.com/cue-lang/cue/issues/1161.
Perhaps this warrants a new discussion or feature issue, but one thing I've found lacking in cases where I've wanted something like the list-as-choice pattern at the end there comes from FP paradigms doing pattern matching. Specifically, language support for guaranteeing that the options are exhaustive.
Reflecting here that the list comprehension version of this provides that in a roundabout (and at-runtime) way: if all the alternatives fail, the list is empty and the index will be out of bounds. So there's some safety railing there.
But: the user is going to get an "index out of bounds" error, (which is confusing when the cause is that an alternative was overlooked), and it'll be the user of the CUE program and not its author who gets the error.
It would be fantastic to have a language level match operator that could, at parse time, emit something like no alternative matches <16, >20 or something. There may be a correspondingly fantastic level of effort to provide that feature, but it sure would be nice.
The default seems to prevent the out of bounds issue, assuming it is always required. One could use error in the case the default should fail the config and provide a more meaningful message.
It may be useful to know that something like <16 | >20 cannot be validated at parse time and requires the evaluator to do its thing ("runtime" in your message, though I'm not sure that is the most accurate term)
It might be also worth considering that, in many ways, CUE comes from Go and there is value in minimizing language features and syntax.
What about an operator for subsumes?
like if subsumes(a, b) { "a subsumes b" }
I'm trying something like
t: int
result: [
if (t & int) == _|_ { "int" },
if (t & int64) == _|_ { "int64" },
if (t & int32) == _|_ { "int32" },
if (t & int8) == _|_ { "int8" },
"unknown",
][0]
which won't work, I think something like this might
t: int
result: [
if subsumes(t, int) { "int" },
if subsumes(t, int64) { "int64" },
if subsumes(t, int32) { "int32" },
if subsumes(t, int8) { "int8" },
"unknown",
][0]
The goal of the example is to turn CUE types into a string, maybe there could be a builtin or stdlib package that helps with that in a more targeted way. A subsume builtin might still be useful more generally
Strong +1 to that - a native subsumption operator is a key roadmap item for thema (née scuemata). For now, the necessary enforcement of a subsumption relation has to be done in Go. (Though that doesn't work either because of a panic that i need to post an issue for, once i have a clear reproduction)
What's the general status on the extensions discussed here? I'm particularly interested in functions.
Noting that we should also consider downcasts https://github.com/cue-lang/cue/issues/454 in scope of new builtins.
Noting what I think is a tricky edge case here:
#X: {
// at least one of foo or bar must be specified by the user
numexist(>0, foo, bar)
foo?: int
bar?: int
}
The definition #X itself will be in error in this case.
I came here from https://cuetorials.com/patterns/functions and am especially interested in the functions syntax, but I have yet to have a use case for any of the other proposals.
Should some of these use cases be split out into different issues? I feel like there's a lot being proposed here. It might make it clearer which features are priorities to users if these were separate issues.
I have just created #3165 for further discussion regarding the encoding of oneofs in CUE.
Building somewhat on #3289 (cue: Value needs a method to finalize a value), and motivated by the discussion in #3674 and details covered in https://github.com/cue-lang/cue/issues/3296, I think we also need to consider two further builtins:
v: _
b1: finalize(v)
b2: concrete(v)
finalize() would be the builtin analog of #3289. Defaults would be selected, templates "removed" (see #3674) amongst other things.
concrete() would further require that the value of its argument be fully concrete, and return a recursively closed value.
concrete()would further require that the value of its argument be fully concrete, and return a recursively closed value.
FWIW I would not define it to return a recursively closed value: I think such a primitive would probably be better just returning just data exactly as if it had arrived from JSON, for example. It should be easy to close it if needed.
For those following along, https://github.com/cue-lang/cue/releases/tag/v0.14.0-alpha.1 was just released with a new error builtin :) Please give it a try and let us know how it works for you.
I want to comment on the function proposal. I currently believe that this is a CUE anti-pattern and pushes it away from its design as a logic language to something more procedural. There are ways in which you can achieve the same outcome as functions without it and without approximating a function using some kind of In: .Out struct. Rather than functions I think it is better to write reinforcing relationships between "inputs" and "outputs" of a function and treat them both as canonical in the configuration. A fully connected subgraph of references between the "inputs" and "outputs" achieves the same thing but in a way that is congruent with the language. There are some niggles with this which I believe would be better to be addressed before reconsidering the stance on functions such as float unification.
i'd personally like the following:
strip(a, b) or something like it : 'strips' all RHS from all values in a recursively. returns just a skeleton of keys with their values set to b.
a: {
path: in: a: to: leaf1: #mydef
path: in: a: to: leaf2: #myotherdef
}
b: null
c: std.strip(a, b) // or struct.Strip()
c: {
path: in: a: to: leaf1: null
path: in: a: to: leaf2: null
}
compress(a,b,d) or struct.FlattenN(...) : turns a LHS skeleton into a flat map of keys and values of b where keys are their full cuelang path to the lattice location in a. d is an int and is a 1-indexed depth marker starting at the root of a.
compressed: std.compress(a, b, 3)
compressed: {
"path.in.a": to: leaf1: null
"path.in.a": to: leaf2: null
}
can be written with a Value.Walk() i imagine, or later with query proposal, but it would be awesome to just have. i gather this exposes the 'brittle barrier' or templated selectors with tasks at the end, but alas it seems the barrier must be crossed at some point. i imagine it would be easier for the owner of the memory allocator to just cut the lattice and flush all the RHS memory at once though. would leave a very sparse memory region i assume. maybe there's some magic that can be done with packed structs of arrays internally that can make this better.
@the-nurk - Thanks for the suggestion. Functions like strip or compress haven't been proposed before, to my knowledge.
Since this is a new area for us, we'd need to see more details to evaluate the proposal. If you are able to explore a possible implementation and share your findings, that would be the best way to clarify the specific use case and technical approach. I'd suggest we pick up this discussion in a new feature request issue for one/both functions.