starlark
starlark copied to clipboard
spec: remove the restriction on if/for at toplevel
The Starlark-in-Go implementation performs a static check that a program doesn't contain if or for statements at top level (outside any function), and I added language to this effect to spec.md, from which the current Skylark spec doc is derived. It occurs to me that this check is unnecessary, and I propose we get rid of it. Removing it would not change any fundamental invariants of the interpreter.
Of course, Bazel can (and must) continue to implement this check on the syntax of BUILD and .bzl files prior to executing them; the necessary logic is simple and efficient. In effect, we would be moving the check from core Starlark to the Bazel build language dialect, similar to the way that the build language disallows 'def' at top-level in a BUILD file.
Not every application that embeds Starlark wants the restriction, and I have encountered several for which it is undesirable.
Yes please! I very often write out my script and then realize I have to dump it in a function for "no reason" because it has branching logic at the top level.
Thanks for the feedback, it's true that this check could be moved outside the core. In general, I think it's good to implement and experiment with a change, before putting it in the specification. This is why I encouraged you to first put this change behind a flag (https://github.com/google/starlark-go/pull/103#issuecomment-452810849).
Like for any change, we should be careful and see how it interacts with the rest.
- With this change, we don't know statically the set of symbols that are exported
- How does that interact with the "single assignment at top-level" rule?
For example:
if f():
var1 = 1
for x in g():
var2 = x
Will var1 and var2 be exported? Will var2 be modified if the loops has multiple iterations?
More general comment We can see some friction here. You want to use Starlark as a scripting language, but some design choices were made with a slightly different point of view (restrictions can help maintain a large code base, and make the language more tool-friendly). A solution is to give more power to the embedder. But then, it might cause fragmentation in the ecosystem. If each embedder is too different, it will be more difficult to share the tools.
Hi Laurent, we have already done the experiment: the -globalreassign flag in the Go implementation has existed for a while, and one of its two effects is to permit if/for/while at toplevel, and I think literally every client is setting that flag. Even the bazel-like tool I mentioned, which implements the if/for-at-toplevel check itself, sets that flag because it wants the flag's other effect.
With this change, we would still know statically the set of global names by inspecting the set of binding uses, but dynamically, due to loops, it's possible for some of those binding operations to be executed zero times, or more than once. This could lead to a global variable having no value, but that's something we have already been having to deal with for a long time: if you say print(x); x=1, then the first use of x has no value and you get an error. Also, in the Go REPL, x=1; fail(); y=2 causes the global y not to have a value, so its result is not copied into the REPL environment.
So I think this change specifically is very safe, and if there was a flag that simply restricted if/for/while at toplevel (without the other effect of -globalreassign), not one client that I know of would set it.
To your more general point about friction: yes, you're right, though a better description might be a tension between two slightly different goals. I want the Starlark core spec to support everything one needs to build high-quality, high-performance bazel-like tools, but I also want it to support other uses. Starlark-Go is probably the highest-quality existing implementation of an interpreted language capable of being embedded into a Go application, and Starlark's conservative but powerful feature set makes it a very attractive target for a wide range of embedded uses. That's the reason for the proliferation of options in the Go implementation: I'm trying to increase its usefulness without compromising its ability to run in a fully Bazel-compatible mode.
When (in the other thread) I said the spec was wrong, I meant that I had made a mistake when writing the spec by including the if/for restriction, as this need not have been a core feature; moreover, since the restriction both adds complexity and removes utility, without enabling any new uses, it should never have been a core feature. I think we should remove it, and instead augment the part of the Bazel manual that describes the existing extra syntax checks that Bazel applies to files in the build language dialect.
Any comments on my comment?
@laurentlb Can we document Bazel's static check for top-level for/if statements in the correct place, which is next to where we document its check for **kwargs and def statements in BUILD files, and remove the unnecessary complexity from the spec and implementation?
It's still unclear to me how this interacts with the "single assignment at top-level" rule.
for i in f():
var = i
In this example, i and var are still global variables? So they are exported? But they're also reassigned during the evaluation?
Do we allow this?
for i in f():
var = i
var = var + 1
I propose that if and for-loops be allowed at top-level, and that the rejection of these features be moved into Bazel proper alongside its rejection of def and f(**kwargs)in BUILD files.
Regarding multiple assignments at top-level: we need to be careful to distinguish static ("binding") operations from dynamic ("variable update") operations. In your first example, the variables i and var are bound exactly once each, but are assigned zero or more times. In your second example, var is bound twice. Clearly we need to permit repeated dynamic assignments because that is what if and for mean. If Bazel wants to reject top-level rebinding, we can make that the default behavior, though for most users, and especially users of top-level loops, top-level rebinding will be desirable, so I think it should remain a dialect option, as today in the Go implementation. I suspect few or no customers beside Bazel would want to disallow rebinding. (As a matter of fact, even Bazel doesn't seem to want to: Bazel-as-documented may reject rebinding, but Bazel-as-implemented has always permitted it.)