hlb icon indicating copy to clipboard operation
hlb copied to clipboard

Yet another proposal to fix binding signatures

Open hinshun opened this issue 4 years ago • 15 comments

Written assuming "call-parens" proposal, and parens optional for no argument functions; not part of proposal, considered off-topic

Ideas:

  • Declarations currently all start with keyword (import, export), this makes function declarations more consistent.
  • Currently additional binds (key value, key2 value2) syntax, this makes it more natural.
  • Currently binds syntax is optional and not part of function signatures, this enforces it.
  • Solving common case where only output return register is desired, no need to have an unused function name.

cc @coryb @slushie @aaronlehmann @copperlight

fn simple() fs {
	image("alpine")
}

fn named_return() (fs ret) {
	image("alpine")
}

# By default, first return register returns value from stack.
fn additional_returns() (fs ret, fs output) (
	image("alpine")
	run("echo foo > /out/msg") with {
		mount(scratch, "/out") as output
	}
}

# Can be overridden to only produce side effect, otherwise `output` returns
# value from stack.
fn overridden_return() (fs output) {
	image("alpine")
	run("echo foo > /out/msg") with {
		mount(scratch, "/out") as output
	}
}

# Going back to example `additional_returns`, `main` and `output` are registers
# defined in the namespace of `additional_returns`.
#
# If multiple return registers are defined, calling without the operator (::)
# returns the first register.

fn output_of_additional_returns() {
	additonal_returns::output
}

fn output_of_overridden_returns() {
	overridden_return
}

# Using the more appropriate `<>` for subtypes

fn metatron() option<run> {
	mount(fs {
		local("${localEnv("HOME")}/.metatron")
	}, "/run/metatron")
}

hinshun avatar Jul 05 '21 02:07 hinshun

fn additional_returns() (fs ret, output fs) (

I'm having a little trouble parsing fs ret, output fs. Is fs a type in both cases? And is ret a special keyword that returns after the type?

aaronlehmann avatar Jul 05 '21 16:07 aaronlehmann

Sorry, that was a mistake. I was confusing golang style identifier type with the more common order that HLB uses: type identifier.

Revised to: fn additional_returns() (fs ret, fs output) (

hinshun avatar Jul 05 '21 16:07 hinshun

I really like this multi-valued return syntax for binds, because it feels "naturally" like side-effects.

It might make sense to adjust the syntax for named vs unnamed return values, because fn simple() fs { doesn't read as cleanly as fn named_return() (fs ret) {. Perhaps fn fs myfunc() ... and fn (fs ret) myfunc() ...? The prefix syntax also makes the implicit stack parameter feel a bit more natural, imho. But maybe we could eventually introduce an explicit stack param in the declaration syntax, eg fn myfunc() (fs [stack] in_out, string etc) or some such.

But generally, this syntax feels very natural to me :+1: And the fn keyword makes me imagine a future val keyword that provides constants 😉 ❤️

I also really like the typed qualified names, eg additonal_returns::output. But of course, it makes me curious:

  • Would qualified names replace the as syntax for user-defined function calls? If so, how do we handle extracting multiple side effects? If not, would as be available in the same syntax? eg, additonal_returns::output as (ret otherReturnValue)
  • I really like this syntax for changing types... eg fn pushDigest(fs input) string { dockerPush::digest input } feels very natural and straightforward, especially compared to the current idiom fs _pushDigest() { dockerPush as (digest pushDigest) }
  • This syntax also introduces scope control into the language. Function scopes exist today, but user code has no control over them, since all binds are in the global scope. Could we also use :: to operate on the global scope? eg, fn overridden_return() (fs output) { run "foo > /out" with mount(::output, "/out") } to reference a global output, and an unqualified ... mount(someFs, "/out") as output for the local scope (and maybe even ... as overridden_return::output to explicitly access the local scope).

On the subject of scope, what are the semantics of additional return values within the function scope? Can they be referenced before and/or after being set? What happens if some remain unset? What happens when an as clause refers to a name not listed in the return values? What happens when they're set more than once? Is the first return value still passed in from the calling function as the implicit stack parameter?

Finally, just wanted to note that I don't have strong feelings on the <> operator for subtypes. I think its probably fine to have option::run coexist with myfunc::someValue because they don't overlap (ie, its illegal to define a user-defined function named option or any other type name, so there is no ambiguity between the two). But its also heavily associated with type operations, so it might provide more readability for new users. 🤷

slushie avatar Jul 05 '21 17:07 slushie

Thanks for the detailed comments!

The prefix syntax also makes the implicit stack parameter feel a bit more natural, imho.

I didn't want to make the stack parameter special because I wanted a way to express functions where the stack parameter is not wanted (only want mount side effect).

Would qualified names replace the as syntax for user-defined function calls? If so, how do we handle extracting multiple side effects? If not, would as be available in the same syntax?

Here's how I imagine multiple side effects working, does this clarify things?

fn echo() (fs out1, fs out2) {
	image("alpine")
	run("echo foo > /out1/msg && echo bar > /out2/msg") with option {
		mount(scratch, "/out1") as out1
		mount(scratch, "/out2") as out2
	}
}

fn foo() fs {
	echo::out1()
}

fn bar() fs {
	echo::out2()
}

---

fn mfst() (string dgst, string cfg) {
	manifest("alpine") as (digest dgst, config cfg)
}

fn files() fs {
	mkfile("digest", 0o644, mfst::dgst())
	mkfile("config", 0o644, mfst::cfg())
	download(".")
}

I really like this syntax for changing types.

I didn't even realize you could do dockerPush::digest now, that makes a lot of sense!

Could we also use :: to operate on the global scope?

My current thought is that I'd like to avoid globals because that brings mutable state which I think HLB doesn't have atm. For example, I prefer const over val for the same reason.

hinshun avatar Jul 05 '21 18:07 hinshun

I like the idea of having named outputs as part of the signatures, but there are a few parts of this proposal that don't feel right. Maybe instead of having some implicit magic around default values on the stack, we keep the return type mandatory but also add optional named results, something like:

fs foo() (fs output) {
    image("alpine")
    run("echo hi > /out/result") with {
        mount(scratch, "/out") as output
    }
    run("apk add -U curl")
}
# foo() is alpine + curl
# foo().output is mount result

Next, as you noted it is frustrating to create named funcs to capture a mount when the func name is never really used. One issue with foo above is that we have to use foo().output all the time when that might be the only intended use-case for foo. For this purpose I think we should add a return keyword. It could be used like:

fs foo() {
    image("alpine")
    run("echo hi > /out/result") with {
        mount(scratch, "/out") as return
    }
}

In this case return expresses exactly what to return on the stack (the mount) and it is no longer required to create an arbitrary name for it. You could combine both to access various parts of the graph:

fs foo() (fs base, fs curl) {
    image("alpine") as base
    run("echo hi > /out/result") with {
        mount(scratch, "/out") as return
    }
    run("apk add -U curl") as curl
}
# foo() returns mount
# foo().base returns alpine
# foo().curl returns alpine + curl 

The above assumes we (once again) allow for aliases on any fs, rather than just mounts as is currently required. I am not sure it makes practical sense as we could accomplish similar results with:

fs base() {
    image("alpine")
}
fs curl() {
    base
    run("apk add -U curl")
}
fs foo() {
    base()
    run("echo hi > /out/result") with {
        mount(scratch, "/out") as return
    }
}

However, the concept of having scoped fs's defined within targets appeals to me. We could definitely punt on this part, just seems potentially compelling for code/structure organization.

These explicit signature outputs should also have the benefit of allowing us to build up a mount without having a bunch of global intermediate steps:

fs foo() {
    image("alpine")
    run("echo hi > /out/result) with     {
        mount(scratch, "/out") as step1
    }
    run("echo hi again > /out/result) with     {
        mount(step1, "/out") as return
    }
}
# foo() is mount result of echo hi + echo hi again
# step1 not available outside of foo

For my examples, I chose the . separator, somewhat arbitrary. I like it because it has parallels with mymod.myfunc for imported modules. I personally would prefer to keep the :: for type association rather than access scope, also not a fan of <> since that has a strong generics/c++-template association. Maybe silly, but I am liking @ as a potential scope separator, it reads well:

foo()@myMount

Which reads: "foo at myMount" which pretty accurately describes that we are doing.

coryb avatar Jul 10 '21 21:07 coryb

I think your ideas make sense with the as return special keyword. It does feel better than the magic first return value. Keeping :: type association as-is works for me as well.

In #227 , @copperlight also suggested # as a separator like so, I'm open to all ., @, # choices there.

foo#myMount()

Not sure how I feel about the argument list coming before, if you have lots of arguments it may be hard to see that we're not using the default return value.

hinshun avatar Jul 13 '21 00:07 hinshun

Second the as return keyword - this usage feels natural for overriding the default return value.

The idea behind having something like a fn declaration is to make it easy to grep for the definition of a function or to gather a list of all known function signatures. It keeps the function names to the left and up-front, which improves visual scanning. However, given that parameter definitions are type name pairs, it feels inconsistent to drop the type definition from the start of the line (and optionally move it to the start of the binds list). If we keep the type definition at the front of the signature, we should think about having a void or unit type, for functions where we only care about their side-effects.

For additional return types, keeping them on the right side bind list makes sense - they can be of arbitrary length, and we don't want to bury the function name beneath too much text.

Both foo#myMount() and foo@myMount() seem viable to me.

copperlight avatar Jul 13 '21 16:07 copperlight

If we want a function marker, we could shift the unnamed return type after the param list and still allow for optional named outputs, something like:

fun myImage(string img) fs {
    image(img)
}

fun foo(string img) fs (fs output) {
    myImage(img)
    run("echo hi > /out/result") with {
        mount(scratch, "/out") as output
    }
    run("apk add -U curl")
}
# myImage("alpine") is alpine
# foo.output("alpine") is mount result
# foo("alpine") is alpine + curl

Those syntaxes seem reasonable to me. Moving the named accessor to before param list seems reasonable also. For function markers I am not a fan of fn, it is too close to fs. These seem like reasonable options:

  • fun, func, function
  • def, define
  • proc, procedure
  • dec, decl, declare
  • sub, subroutine

For void or unit I am not opposed to the concept but we will need some use-cases and have to think through some of the edge cases. Buildkit will "optimize" your graph, so I am afraid a void func will translate to a no-op in the graph and be pruned. We would likely need to add artificial outputs so the vertices are not prune, but I suppose we can hide that in HLB.

For the accessor syntax, I would vote for:

  • foo.myMount()
  • foo@myMount()

I also conceptually like foo#myMount() but worry it will forever be misidentified in my editor as a postfix comment foo # myMount(), it seems like a problem best to avoid.

coryb avatar Jul 14 '21 18:07 coryb

For the accessor syntax, I would vote for:

  • foo.myMount()
  • foo@myMount()

Curious why () comes after myMount here. The other way around seems more natural to me, since we are not invoking myMount.

I also conceptually like foo#myMount() but worry it will forever be misidentified in my editor as a postfix comment foo # > myMount(), it seems like a problem best to avoid.

Agreed.

aaronlehmann avatar Jul 14 '21 19:07 aaronlehmann

If we want a function marker, we could shift the unnamed return type after the param list

This seems viable.

For void or unit ... we will need some use-cases

Good point - did not realize there was a risk of dropping tasks from the graph due to this. It should only be valid to declare the primary return void when secondary outputs are defined. The intention behind this notation is to make clear to the reader that the main return from the function has no further use. Although, with a mechanism to change the default return from a function with the as return keyword, it may not be necessary to add this feature - it makes it easy to specify exactly what outputs from your function matter.

For function markers I am not a fan of fn, it is too close to fs.

I am on-board with that. In no particular order, I like def (Python, Scala), defn (Clojure), and func (Golang).

I also conceptually like foo#myMount() but worry it will forever be misidentified in my editor as a postfix comment foo # myMount(), it seems like a problem best to avoid.

Also agree.

Curious why () comes after myMount here. The other way around seems more natural to me, since we are not invoking myMount.

Ah, good question. I flipped it around because foo()@myMount looked weird to me when I wrote it, although it is quite common in Scala to do things like foo().myMount. In practice, they aren't really that different, though. Looking at it through the lens of a long parameter list, we would have:

foo(i, have, a, long, parameter, list, that, goes, on, forever)@myMount
foo@myMount(i, have, a, long, parameter, list, that, goes, on, forever)

Even with the long parameter list, I think keeping the parens to the function name makes the most sense - it emphasizes the function name and the parameters being sent to it.

I am a little hesitant to back the . notation for secondary returns, because there was some discussion on the other ticket about how that overloads the concept of referring to imported functions. If there is a way to handle this gracefully, such that it's acceptable for both uses, then maybe it works?

import foo from "./foo.hlb"

fs default() {
    foo.build()@myMount
               ^ at syntax to access secondary return
       ^ dot syntax to access imported functions
}

copperlight avatar Jul 14 '21 21:07 copperlight

If we want a function marker, we could shift the unnamed return type after the param list [...] fun foo(string img) fs (fs output) {

This grammar looks so inelegant, though :( The dangling, bare fs conveys very little semantic value IMHO. How about just decorating with return fs eg, fun foo(string img) return fs (fs output) to match the as return syntax? I suppose it's more verbose, but I think it works nicely with the fun (or some such) keyword and makes the optional (fs output) more clear.

I like def (Python, Scala) [...]

+1 for def, I agree with the sentiment behind fs != fn, and I also like fun. But I'm curious what @hinshun prefers here.

[...] I think keeping the parens to the function name makes the most sense - it emphasizes the function name and the parameters being sent to it.

It also helps to clarify how we're calling the function, and then capturing a side-effect.

I am a little hesitant to back the . notation for secondary returns, because there was some discussion on the other ticket about how that overloads the concept of referring to imported functions.

Certainly . is confusing wrt imports. Worse IMHO is that it looks like field access on some structured type returned by the function, which this explicitly is not. I think I hated @ at first, but its quickly grown on me, and its uniqueness helps it feel more like a function call rather than simply a value in a struct.

slushie avatar Jul 14 '21 23:07 slushie

[...] fun foo(string img) fs (fs output) {

This grammar looks so inelegant, though :( The dangling, bare fs conveys very little semantic value IMHO. How about just decorating with return fs eg, fun foo(string img) return fs (fs output) to match the as return syntax? I suppose it's more verbose, but I think it works nicely with the fun (or some such) keyword and makes the optional (fs output) more clear.

Hah, I thought it quite elegant 😄. I don't think the return in the signature tells me anything I didn't already know, also feels inconsistent:

fun foo(string img) return fs (fs output)

Why is it return fs, then fs output, I would want symmetry. If return was a magical named value then maybe

fun foo(string img) (fs return, fs output)

but that feels weird and unnecessary also. I would also expect this to be valid usage foo("alpine")@return which seems silly to me. If we have a func that only returns one thing, would that be:

fun foo(string img) fs return

or

func foo(string img) (fs return)

All of these return uses don't provide any more semantic value for me.

coryb avatar Jul 15 '21 00:07 coryb

If return was a magical named value then maybe

But it's not a named value at all, its a syntactical keyword with a type name as its parameter. That's why foo()@return is such nonsense, and why mount ... as return is different from defining an fs named "return." I think that's more syntactic consistency, notwithstanding the lack of grammatical consistency.

I don't think the return in the signature tells me anything I didn't already know [...] All of these return uses don't provide any more semantic value for me.

I agree, and of all of those others, fun foo() fs is the best option 😆 I do think its better to have fun foo() fs (fs output) than fun foo() (fs a, fs b) because it sets the state apart from the side effects.

slushie avatar Jul 15 '21 00:07 slushie

I think out of the options presented, I like def the most, and it's also most clearly distinguished from the fs type. I'm also on the camp of foo()@output now! It has grown on me.

Here's an updated proposal:

def foo(string img) fs {
    image(img)
}

def echo() fs {
    image("alpine")
    run("echo foo > /out/msg") with option {
        mount(scratch, "/out") as return
    }
}

def npm(fs src) fs (fs pkgLock, fs nodeModules) {
    image("node:alpine")
    run("npm install") with option {
        dir("/in")
        mount(src, "/in") as pkgLock
        mount(scratch, "/in/node_modules") as nodeModules
    }
}

def src() fs {
    local(".") with includePatterns("package.json", "package.lock")
}

def npmInstall() fs {
    npm(src)@nodeModules
    download("./node_modules")
}

hinshun avatar Jul 15 '21 03:07 hinshun

looks good to me. def is my least favorite due to ptsd from python support, but creating new positive associations with it will be good for me 😄

coryb avatar Jul 15 '21 04:07 coryb