elvish icon indicating copy to clipboard operation
elvish copied to clipboard

Support dynamically scoped variables

Open xiaq opened this issue 4 years ago • 4 comments

Intro

Dynamically scoped variables (or "dynamic variables" for short) are kind of like global variables. However, they can change values temporarily ("dynamic binding"), and those changes are only visible in the lifetime of a function call, and all the functions that are called from it (hence "dynamically scoped"; the extent of the change is determined by what happened at runtime, not what appears lexically).

Dynamic binding can share the syntax of temporary assignment. Suppose dynamic variables are declared with dyn, an example:

dyn x = foo
fn f { echo $x }
x=bar f # writes "bar"
echo $x # writes "foo"

Superficially this indeed looks identical to temporary assignment. But a key difference is that if there is a parallel task accessing $x after its definition, it will always see its original value foo.

While this seems to be a nice-to-have feature in general, it actually has some interesting applications in Elvish.


Use case 1: FD table

Elvish already has an implicit dynamic variable: the FD table. Redirections follow exactly the same rule as dynamic scoping: it affects a function call, and all the functions called from it, but nothing else. If Elvish supports dynamic variables, the FD table can be reified as a variable. This will allow writing redirections using the dynamic binding syntax:

fds[1]=(fopen output) cmd # same as: cmd > output

While this example is just more verbose than the redirection syntax, it's clearer in cases that involve redirections from other FDs, such as:

fds[1]=$fds[2] cmd # same as: cmd >&2
fds[1 2]=$fds[2 1] cmd # swap stdout and stderr

The last example is currently quite tricky to express, although there is a proposal to make cmd {1,2}>&{2,1} do that (#733). Still, the dynamic binding version reads much cleaner.


Use case 2: $pwd

It is idiomatic in Elvish to use temporary assignment to pwd to run a function in a directory without changing permanently into the directory, for example:

pwd=/usr/local ls

However, this suffers from the problem mentioned above for temporary assignment: if another function is running in parallel, it will observe the temporary assignment. If $pwd is instead a dynamic variable, this will not be a problem, and this idiom will truly only affect the function being invoked and nothing more.

There is a catch though. The $pwd variable is inherently a singleton global variable, since it's part of the state of the running Elvish process. Changing it to a dynamic variable requires first virtualizing the working directory, which involves the following:

  • Adding the working directory as a field of the call frame structure.

  • Whenever the Elvish interpreter access the filesystem, never rely on the process's actual working directory. Instead, use the working directory from the call frame to resolve all relative filenames.

This is in fact very similar to how the FD table is virtualized. Whenever the Elvish interpreter accesses the standard IO streams, it does not use the actual stdin, stdout or stderr of the process, but use the ports from the frame.


Notes about other languages

While most languages today default to lexical scoping (and many do not support dynamically scoped variables as a first-class citizen at all), early Lisps only had dynamic scoping. (Emacs Lisp is from that era and is often criticized for that.) Newer Lisps, like other modern programming languages, usually default to lexical scoping, but supports dynamic binding as an optional feature (Clojure, Racket).

So-called local variables in Bash are in fact dynamically scoped; try this at home:

f() { echo $x }
g() { local x=1; f }
g # prints 1

So are local variables in Perl:

sub f { print $x; }
sub g { local $x = 1; f; }
g; # prints 1

xiaq avatar Apr 28 '20 22:04 xiaq

I feel like the question of supporting vs. not supporting dynamic variables is one of those cases where the attempt to make in-shell functions and built-in objects behave like external commands becomes all but unsustainable. When running an external command you are constructing the environment for the new process, so it's natural to push variable changes into that new environment. But within the shell, particularly given the heavy emphasis on lexical scoping (hooray for that BTW) I don't know that it fits.

I also wonder, if dynamic variables are supported, if that distinction should be at the level where the variable is created, or at the level where the variable is referenced. If we have this:

# Create variable, establish it as one that is dynamically scoped
dyn x
# Create function which references x: In capturing (x) it will recognize that it is dynamic,
# and leave open the possibility for it to be overridden
fn f { do_something_with $x }

This changes the behavior of f, essentially creating a back-door to pass in parameters to it that would otherwise be closed. That concerns me because a function could be written with the expectation that its variable references are lexically-scoped, but then (due to relocating the function within the script, etc.) that could suddenly, perhaps accidentally change. For this reason I'm a bit inclined to think that dynamic scoping of a variable in a function should be expressed as a property of the function (or of the variable reference within the function), rather than a property of the variable.

But that can make things more cumbersome - so honestly I'm not sure I have the answer here.

zakukai avatar Apr 30 '20 15:04 zakukai

Another idea is to make dynamical variables lexically distinct. Borrowing from Common Lisp, there is no lexical distinction in the language, but there is a lexical convention: Dynamic variables (called special variables in CL) have names beginning and ending with an asterisk. So what if we allow a variable name just to begin with an asterisk (and no asterisks allowed elsewhere in the name)?

*x = a
fn f { put $*x }
fn g { *x=b f }
g # → b
f # → a

The advantage is that by making dynamical variables lexically distinct, they are easily spotted without any need to analyse the code. The main disadvantage is that it complicates syntax a little. But as far as I can tell, this proposal does not conflict with any currently legal syntax, so it should be backward compatible at least.

Edited to fix a typo in the code example.

Edit the second: The above example is not a good example, because it works the same way now, if you just remove the asterisks. Let me think for a while, and get back.

hanche avatar Apr 30 '20 20:04 hanche

I think I now understand what had me confused: The terminology used is slightly different from what I am used to. I come from a Common Lisp mindset, and there

  • scope refers to the places in the program text a certain binding is (or could be) visible, whereas
  • extent refers to the times in the program execution the binding is in force.

If I got it correctly, the current proposal is all about extent, not about scope. So if we return to the original example slightly modified:

dyn x = foo
fn f y { echo $x $y }
run-parallel { x=bar f 1 } { x=zip f 2 }
# should print these lines in either order:
1 bar
2 zip
# but never 1 zip or 2 bar

Here are three bindings for x: The initial one foo, and the two bindings bar and zip. The scope of all three bindings are the same, and include all the program text shown. But the extent of the two temporary bindings are restricted, one to each of the two parallel threads.

At least, that is how I think of it, using terminology from Common Lisp.

(I still think my proposal of an initial asterisk as.a lexical marker is reasonable, even though the example is not.)

hanche avatar May 01 '20 20:05 hanche

@hanche you're right, this proposal is about extents. I shouldn't have used "dynamic scoping"; that muddies the water. The variables themselves are still lexically scoped, it's their values that can be overridden for a dynamic extent.

xiaq avatar Dec 21 '21 01:12 xiaq