plutus
plutus copied to clipboard
Syntax of builtins
Describe the feature you'd like
Once #4738 gets merged, the classic syntax of builtins will look as follows:
- built-in
Integer
:
(con integer)
- built-in
[Integer]
:
(con (list integer))
-
5
of built-inInteger
:
(con integer 5)
-
[1, 2, 3]
of built-in[Integer]
:
(con (list integer) [1, 2, 3])
Describe alternatives you've considered
There have been two proposals to improve the syntax:
- render
Integer
as(con (integer))
instead of(con integer)
and render5
as(con (integer) 5)
instead of(con integer 5)
. This is for consistency with rendering of constants of polymorphic built-in types - make a more radical change and render
[Integer]
as(con list(integer))
and(Integer, Bool)
as(con pair(integer,bool))
Kenneth explains:
What we're looking for is a language for writing down built-in types. We want to write down
(con <type> <value>)
and we want<type>
to be unambiguous. To keep the parsing simple I think
(a) we should have one and only one way of writing down each type (b) white space should not be part of the syntax.
I don't like
list integer
because then we need to writelist (list integer)
and then maybe we want to allowlist (integer)
as well and we have two ways of writing down the same type. This seems like trying to re-use Haskell syntax, but that's making things a bit too complicated.
My proposal (and coincidentally what I use in the specification) is that a type is either an atomic type, which is just a string like
integer
orstring
, or a type application,op(t₁, ..., tₙ)
, where thetᵢ
are types. This is completely unambiguous and very easy to parse.
We could also allow curried type applications like
pair integer string
, but I object to that on the grounds that then you once again need to introduce parentheses if the arguments are complex:pair (list integer) string
.
Another disadvantage of this is that it's more difficult to tell where the type ends. You have to work your way along parsing types separated by white space and possibly also involving parentheses, and when you see
t a b c
you have to parset
and then notice that there's something else after it, and probably end up with a nested application like(((t a) b) c)
which you have to disentangle later, checking thatt
is a type operator and that you have the right number of nested applications.
If everything's fully parenthesised then this doesn't arise: you see
t
, then you see(
and you know that you've got a list of types coming up, so you parse those and stop when you see the corresponding)
. You've got the operator and a list of argument types, and it's easy to check that you've got the right number of arguments.
My comments:
if we make it (con (list integer))
, then it's equally easy to parse, it's just space-separated rather than comma-separated and it's the same "you see (
and parse until the corresponding )
" logic. I'm not sure if whitespace here is any more part of syntax then in, say, [f a b c]
which is an iterated application that we use with regular type applications.
However I wouldn't mind pair(integer, bool)
.
My selfish reason for liking pair(integer, bool)
is that it's standard algebraic notation. I use it in the specification, where everything else looks like that too, and using something sexp-like for types everywhere instead would really mess things up. Whatever we do, I think we should keep the f(x,y,z)
notation in the specification; if we do use some other notation in the concrete syntax it wouldn't be a disaster, but we'd have to specify some translation to the parenthesised notation used everywhere else, which would be a pain.
Previous discussion from #4414 (closed as duplicate even though it's older. Sorry @jmchapman! I didn't realize there was a ticket already when I created this one).
@jmchapman:
Describe the feature you'd like
The parser and pretty printer do not properly support lists, pairs and data.
I would like to use the syntax
con (list bool) [True, False, True]
andcon (list (pair integer bool)) [(1, True), (500000, False), (0, True)]
for lists and pairs.@SchmErik proposes the following syntax for
data
which I think is nice:(program 0.0.0 (con data { Map [ (ByteString #0123, Integer 12345), (ByteString #456789, Integer 789453), (ByteString #0ABCDE, Integer 12364689486)] }))
Note that the concrete syntax is not used on chain, is not guaranteed to be production quality, and is subject to change. For testing production code, the flat representation should be used. Concrete syntax is useful for pedagogical and exploratory purposes.
Describe alternatives you've considered
I suppose one could write something more json like inside the curly brackets for data.
@michaelpj:
I'm fine with the syntax for pairs and lists. Although it occurs to me that if we really wanted to stick to our sexp guns we could have
(pair x y)
and(list x y z)
as term syntax (punning with the type syntax).I don't have very good suggestions for
data
...
Proposed syntax needs to deal with nested examples when it comes to polymorphic data. e.g.
(con (list (list (list integer))) [[[1], []], [[], [2,3]]])
We have reached a decision regarding syntax for constant terms. That is, terms of the form:
M \Coloneqq \dots ~ \mathtt{(con} ~ T ~ c_T\mathtt{)}~ \dots
Type tags are as follows:
T \Coloneqq \mathtt{integer} \mid \mathtt{bool} \mid \mathtt{bytestring} \mid \mathtt{string} \mid \mathtt{unit} \mid \mathtt{data} \mid \mathtt{(list} ~ T\mathtt{)} \mid \mathtt{(pair} ~ T ~ T\mathtt{)}
We define a syntactic class of constants for each type tag.
\begin{align}
c_{\mathtt{integer}} & \Coloneqq \mathtt{-}?[\mathtt{0-9}]* \\
c_{\mathtt{bytestring}} & \Coloneqq \mathtt{\#}([\mathtt{0-9A-Fa-f}][\mathtt{0-9A-Fa-f}])* \\
c_{\mathtt{string}} & \Coloneqq \mathtt{''} U* \mathtt{''} \\
c_{\mathtt{bool}} & \Coloneqq \mathtt{True} \mid \mathtt{False} \\
c_{\mathtt{unit}} & \Coloneqq \mathtt{()} \\
c_{\mathtt{data}} & \Coloneqq \mathtt{(Constr} ~ c_{\mathtt{integer}} ~ c_\mathtt{(list~data)} \mathtt{)} \\
& \mid \mathtt{(Map} ~ c_\mathtt{(list ~(pair ~data ~data))}\mathtt{)} \\
& \mid \mathtt{(List} ~ c_\mathtt{(list~ data)}\mathtt{)} \\
& \mid \mathtt{(I} ~ c_\mathtt{integer}\mathtt{)} \\
& \mid \mathtt{(B} ~ c_\mathtt{bytestring}\mathtt{)} \\
c_{\mathtt{(pair} ~\alpha~\beta\mathtt{)}} & ::= \mathtt{(} c_\alpha \mathtt{,} c_\beta \mathtt{)} \\
c_{\mathtt{(list} ~ \alpha\mathtt{)}} & ::= \mathtt{[} (c_\alpha (\mathtt{ ,} c_\alpha)*)?\mathtt{ ]}
\end{align}
where $U$ is a unicode character. Note that for lists and pairs we are actually defining schemas ($\alpha$ and $\beta$ are meta-variables).
Some example constant terms are:
(con (list integer) [ 0 , 1 , 2 ])
(con (pair bool bytestring) (True, #012345))
(con (pair unit string) (() , "hello universe"))
(con (list (list (list integer))) [[[-1], []], [[], [2,3]]])
(con data (Constr 1 [I 2])
(con data (Map [(I 0, B #00), (I 1, B #0F)]))
(con data (List [I 0, I 1]))
(con data (I 2))
(con data (B #001A))
The syntax for pair and lists is the one supported by the current parser. The syntax for data is not implemented yet.
@mjaskelioff fantastic, thanks a lot! Since this ticket is just for settling on the syntax, I'm closing it as done. For the implementation we have #5064.
@mjaskelioff
I'm working on porting the same format to Aiken.
Is whitespace left unspecified by your spec? If not, I can't figure out the rule for spaces in lists from your examples, considering:
(con (list integer) [ 0 , 1 , 2 ])
This example wraps each item in a space on both sides, but this example:
(con (list (list (list integer))) [[[-1], []], [[], [2,3]]])
has a space before some items in lists, (, []
and , [2,3]
) but not others (,3
)
Is this just an artifact of providing examples by hand, or is there some rule governing this?
Hi @Quantumplation, whitespace can be ignored, so any of the following should work:
-
(con (list integer) [1,2,3])
-
(con (list integer) [ 1 , 2 , 3])
-
(con (list integer) [1, 2, 3])
Can the spec be updated with this?
Yes, it should be updated. I'll do it, but I do not know when yet. Hopefully beginning of next year.
@mjaskelioff The spec seems kind of contradicting the examples you provided. The spec would require the following syntax:
(con data (Constr 1 [(I 2)])
(con data (Map [((I 0), (B #00)), ((I 1), (B #0F))]))
(con data (List [(I 0), (I 1)]))
(con data (I 2))
(con data (B #001A))
But your examples show the parentheses omitted in nested data (which is good for readability). However this leaves us with an ambiguous edge case. What about builtin list or pair of data? I interpret it that "data has parentheses around its outermost occurrence" which would result in
(con (list data) [(I 0), (B #)])
But interpretations can also allow
(con (list data) [I 0, B #])
Which is ambiguous - does this denote a list of data or a plutus data list? (if you ignore the type string)
I'm sorry but I don't understand the ambiguity. This syntax is for plutus core. What do you mean by a "plutus data list"?
The examples omit parentheses around data constructors. Parenthesis around pairs cannot be omitted so you cannot interpret [I 0, B #]
as a singleton list of pairs.
Yes, it should be updated. I'll do it, but I do not know when yet. Hopefully beginning of next year.
@mjaskelioff This is PLT-378, which I'm planning to do next week. It kind of got forgotten about but I dug it out recently.
I'm sorry but I don't understand the ambiguity. This syntax is for plutus core. What do you mean by a "plutus data list"?
Sorry I could have been a bit clearer here.
Short rant about the syntax
The whole constant is not ambiguous, its just the part of the right. Its not clear if the right part (`[I 0, B #]`) describes a Plutus List (`data`) or a Builtin List of Plutus Data (`list data`). Non-ambiguity is not necessary here either since we have the type description. I see the point that pure data would be `List [I 0, B #]`.The thing is just that according the syntax the parentheses should not be omittable neither in nested Plutus Data nor in Builtin Lists/Pairs containing Plutus Data. Now tools like @aiken-lang do enforce omitting the parentheses for Plutus Data in Builtin Lists while the @opshin uplc implementation requires the parentheses around the outermost Plutus Data occurrence. Should it be optional in any case or always omitted in nested data structures or ...? The spec could benefit from examples that cover Builtin Pair/List containing Data or details regarding the use and omitting of parentheses.
In summary the question I have is: which of the following notations is valid for nested builtin containing Plutus Data? A, B or both?
A: (con (list data) [I 0, B #])
B: (con (list data) [(I 0), (B #)])
A: (con (pair data data) (I 0, B #))
B: (con (pair data data) ((I 0), (B #)))
The whole constant is not ambiguous, its just the part of the right. Its not clear if the right part (
[I 0, B #]
) describes a Plutus List (data
) or a Builtin List of Plutus Data (list data
).
The part of the right is not ambiguous. A data
list has a List
in front while list
doesn't. So [I 0, B #]
is a list data
, while List [I 0, B #]
is a data
list.
Okay. I also just realized that there is a uplc distributable as part of the releases now. The binary clarifies that A
from above is the only valid notation (B
is rejected and A
is accepted by the uplc binary).
The whole constant is not ambiguous, its just the part of the right. Its not clear if the right part (
[I 0, B #]
) describes a Plutus List (data
) or a Builtin List of Plutus Data (list data
).The part of the right is not ambiguous. A
data
list has aList
in front whilelist
doesn't. So[I 0, B #]
is alist data
, whileList [I 0, B #]
is adata
list.
I think we may change this in the parser soon. PR #5692 updates the specification to include the concrete syntax for lists, pairs, and data
, and when doing this we realised that there is in fact some inconsistency there. When you write a data
object as part of a constant it has to be enclosed in parentheses, like (con data (I 5))
. When you write one as part of a built-in list the parser requires you to leave them out, so you have to write (con (list data) [I 5, B #A5F3])
. The specification now says that data
items always have to be enclosed in parentheses, so you'll have to write (con (list data) [(I 5), (B #A5F3)])
, which is a bit annoying. We'll come back to this after the holidays and either tighten up the parser to require the parentheses or change the specification to be a bit more relaxed about where parentheses are required.
Sorry for the confusion! If we'd added the syntax to the specification earlier we might have noticed this and fixed it a while ago.
Has the "lets make a sexpr compliant spec" ship sailed? :pleading_face: