Lightweight observable sharing with linear types
Naively, observable sharing means that we either need to put sharing in the hands of users (by giving them the choice to share or not), or give them a verbose API that doesn't really resemble maths. We might be able to resolve this if we eliminate the possibility of reusing a value by making the value linear.
Consider this API
(+) : (1 _ : Tensor s t) -> (1 _ : Tensor s t) -> Tensor s t
Then if we create a value e.g. via
let x = abs y + 1 in ...
then we can only use x once. We can't write
let x = abs y + 1 in x + x
We'd have to write
let x = abs y + 1
x' = abs y + 1
in x + x'
which makes the duplicate calculation explicit. The question would be then: how do we reuse a value? Let's look at the tensor graph object, with Let and Var. We'll use the complete expression let x = 4; x' = 5 in x + x'
Let 0 (Lit 4) (Let 1 (Lit 5) (Var 0 + Var 1))
Notice that we don't reuse the Lits, each is only used once. However, the reference to each Lit is separate - we can use that as many times as we want.
So
share : (1 _ : Tensor s t) -> Graph $ Tensor s t -- will this type signature make the argument reusable?
do x <- share 1
pure (x + x)
is ok because each x is the reference to 1, not the 1 itself. In terms of the AST, the Lit 1 only exists once, but the reference to that Lit can be used as many times as we want. This suggests an API like
data AST where
Let : Nat -> (1 _ : AST) -> AST -> AST
Questions:
- Would
Tensorhave to be strictly linear, with the APItensor : (1 _ : (1 _ : Tensor s t -> a)) -> aAPI? How would we make that practical with literals? - How do we implement
share? Does the result ofsharebeing non-linear break the safety? - Can we implement this with a mutable array or
SortedMap Nat AST, rather thanLetandVar? I imagine so, they seem equivalent
Note, given the state we're in now, can we just make tag use linearity, which will mean most computations won't require linearity, and it will drop the Tag monad entirely.
can we make everything linear, but overload fromDouble etc. as an unrestricted tensor, on the basis that calculations on scalars aren't likely to be a bottleneck? Users wouldn't be able to create arrays without using linearity (though we could make a non-linear version of tensor with a clear enough name to communicate the dangers).