unseemly
unseemly copied to clipboard
Place `dotdotdot` (the quasiquotation construct) once per `Star` node, not once per nonterminal
dotdotdot should be able to go anywhere a repetition can happen. For example, it should be possible to write
'[Expr | .[ ...[,a, >> ,[a], : Int ]... . <something> ].]'
The only place that core_qq_forms::dotdotdot_form uses its nt argument is in the grammar: (named "body", (call_by_name nt)); that can be replaced by whatever syntax is inside the Star. But there's nothing to call "body"; the Star could potentially contain, say, three Named nodes. What do we do then?
This is a splicing-related can of worms that needs to be opened; the current behavior doesn't make a ton of sense.
But there's nothing to call "body"; the
Starcould potentially contain, say, threeNamednodes. What do we do then?
I think this isn't a big deal. Let's try just wrapping all of the components into one big Shape. The Named nodes will still get found by their Scope and everything will work out, I think. This might not be as hard as I've been imagining.
Wait, this is actually a big deal. Currently, "dotdotdot" is, grammatically, a Scope whose grammar contains a bunch of complexity, including other Scopes it doesn't need to peek inside (which is weird, but it works!).
The proposed change would insert it inside existing form grammars. This is totally unhygenic. For example, "apply" would have a Scope between it and its "rand"s, and its implementation wouldn't be able to find them through the intervening Scope.
Can we have a form that doesn't come with a Scope? The trouble is that "dotdotdot" isn't quite transparent from a typechecking point of view; it needs to munge the environment.
"Munge the environment"? Maybe dotdotdot wants to be a kind of beta!?!? (I do love solving problems by adding new things to beta.) That sounds great... except that betas would have to be able to peel :::[ ]::: off a type (and explode **[ ]**, for that matter), and beta shouldn't know anything about the actual forms of Unseemly.
This does feel like (at type-time) an environmental operation. Maybe there's some way to make it work (I have vaguely contemplated whether a reflective beta operation ("perform arbitrary computation on names") would be valid). Even so, it might be fighting with the fact that, (at quasiquotation-time) dotdotdot is definitely not an environmental operation.
Ideas:
- Some new kind of pre-match, but in positive modes this time (note: nooo, we want to get rid of pre-match, somehow).
- When rewriting form grammars, also replace the implementation of the form to first seek out and resolve repetitions.
- Try to resolve repetitions from the start of the quasiquotation... maybe we're guaranteed that this is valid for (handwave) phase reasons? Or quasiquote-is-boring reasons?
(Note that you'd need to legitimately do an ast_walk-style walk when you're seeking out repetitions to resolve them. Unless the third bullet pans out, in which case never mind. Need to work an example, I suppose.)
Okay, I think that resolving ...[ ]... at '[ ]'-time should work. The quasiquotation means that all bindings are shifted by a phase, so the dotdotdots it contains (operating, as they do, one phase offset) can be resolved at that time, if desired.
Sadly, this doesn't provide any guidance on how to handle :::[ ]::: types. You can (and often do) write types like ∀A. :::[ A >> ... ]:::. OTOH, :::[ ]::: is a legitimate form, not some weird interloper, so maybe the weird stuff that we're doing for it is unnecessary, and we can just handle it the same way we handle ∀.
The other problem is how to represent dotdotdots. We can't use a Scope because we don't want to mess up form bindings (well, we could unhygenically add it and unhygenically remove it, but that, at a minimum, completely breaks freshening). I think the simplest thing to do is to use an S-expression: make a Shape whose first element is (vr "dotdotdot") (but gensym for uniqueness), whose second element is the body, and all whose subsequent elements are the driving names. I need to think about whether it needs a protect-binding also (however, that has the problem that it would need to have a form-level name for the driving names, and, while I think I can just gensym one up, it feels very ad-hoc to do that).
Oh, however we do this, this is going to expose a soundness problem (actually, it's already been visible): when we typecheck a repetition of unknown length, we just treat it as though it had a single repetition. However, sometimes the number of repetitions matters: for example '[Expr | (two_arg_function x ...[,y, >> ,[y],]...) ]' is valid only when y is a tuple of size 1. So we need to impose length bounds on tuples, and we need type-level numbers, and we probably need to actually implement kind checking to make sense of it all. But that's a separate issue (#44).
Maybe we can do some ad-hoc gensymming in a way that makes this work more cleanly everywhere else.
We find-and-replace all Stares/Pluses with an import from an adjacent dotdotdot form (instead of wrapping them with the dotdotdot form, like we did before) to get the order right (we use gensym to safely create names for them). The type rule for dotdotdot then works the environment magic (unwrapping repetition types) ... oh, but what if we have an explicit tuple? Then we need to actually repeat the typechecking. We'll need to rewrite the term, which is something I want to stop doing, since it means emitting error messages in terms of code the user didn't write (!).
So maybe adding some sort of repetition form to Ast is the way to go. It really seems like get_rep_res (and march_parts, etc.) needs to be aware of repetition.
Okay, I tried adding a repetition form to Ast, and I gotta say, alpha.rs doesn't like having an intrusion on the middle of its EnvMBEs. I think it could be done, but I don't want to make alpha.rs harder to understand.
I'm reluctant to go back down this road, since I still haven't removed the evidence (#18) of the last abortive attempt to make EnvMBE aware of dotdotdots, but I wonder if we need to make EnvMBE aware of dotdotdots.
So the last time, the mistake was the idea that EnvMBE would handle it implicitly during matching. This time will be different, maybe even better: EnvMBE::{get_rep_res,march} will return something other than Vec (ExampleVec), roughly equivalent to Vec<Example<T>>, where
enum Example<T> {
One(T),
Rep(T, Vec<Name>)
}
Oops, didn't mean to close this.
Okay, this rabbit hole is deep. I'm partway through rewriting EnvMBE (#45), and it's, uh, not as simple as I expected. I still think that EnvMBE should be rewritten (I bet there's a big performance gain to be had), but I want to get this issue resolved before that.
So, let's walk through our alternative options.
I think arbitrary computation inside betas is probably safe, but definitely a pain: there are a lot of pub functions in beta.rs, even though you'd expect there to be, like, one. But I think we can sidestep that problem by having a "sibling" dotdotdot form that does an export. Maybe there will be some weirdness, but it sounds like it would work.
But we still need to handle the repetition-count thing with get_rep_res and march. And I think we can maybe add more stuff to WalkMode to solve this problem in a per-mode way: (a) a function to identify the "signature" of dotdotdots (presumably a shape whose first subterm is a particular gensymed name, because we remember our Lisp roots), and (b) an indication of what to do for them in positive and negative walks (which gets called inside get_rep_res and march).
I think it'll work? I'll go to bed.
Exported names aren't the product of arbitrary computation; they use betas too! But this should be easy to solve: we just introduce a new type form (Unrepeat<T>?) that generates the type we want. So, the Ast representation of a dotdotdot is something like this:
Shape(ddd_ident, ExtendEnv(actual_contents, beta![driver_1 : d1_unrep driver_2 : d2_unrep]))
...where ddd_ident is a global gensymmed name, and d?_unrep are locally gensymmed part names that get bound to Unrepeat<driver_?>.
Okay, now that I'm writing this down, I'm pretty sure we can't do this with the parser yet. We'd need to put a Named inside a Named (untested, but maybe already works), and generate gensymmed names on the fly (ick!), and maybe have something like a parameterized version of Anyways() that wraps syntax up.
Hm. Maybe there's a way to simplify that. Call that problem 1.
Problem 2: What do get_rep_res and march_all actually do? There'll be some per-mode function that identifies Shape(ddd_ident, ...) and figures out what to do with it. For evaluation, that means exploding the repetition (and setting up some ExtendEnvs to adjust the environment). For typing, that means exploding the repetition if at least one type is concretely a tuple, and constraining all the types that aren't (can we express the appropriate type constraints? I think that's what splicing map is for). Does all this exploding and ExtendEnv-adding count as modifying the user's code? I think not, but I'm not sure I can justify it.
Problem 3: When we put this together, does it work? This honestly feels pretty flimsy. Usually, the interface of Ast ensures that names are handled safely. This doesn't seem safe like that.
A while back, I tried adding a Rep() form to Ast; I discovered that this required a bunch of surgery on alpha.rs. But it seems like this is heading back in that direction.
I think I can get rid of the make-the-parser-gensym-new-names and the Named-inside-Named issues by adding a new kind of beta that contains a literal Ast as the RHS (type) side. Haven't thought it through yet, but that seems like a good deal, and might be useful for future weird things.
This is still a depth charge of complexity, affecting nearly every major part of the language. But I think we gotta make dotdotdot work.
Okay, the last remaining really weird thing is this Unrepeat<T> type. Its purpose is to unpack a tuple, okay, but it can't be used on an explicit tuple: Unrepeat<**[Int String]**> is a completely incoherent thing to say! It only makes sense if it's being interpreted "the same time as" a dotdotdot, so that it means Int in one place and String in another place.
... [long train of thought omitted] ...
WAIT A SECOND, it makes perfect sense to represent repetition as a special Ast node. The problem that I ran into when I was trying it last was that repetitions occur "in the middle of" an EnvMBE, and representing that with a form that "resumes" the EnvMBE was a mess. But using Shape doesn't solve that problem; it just prevents us from representing repetitions that aren't top-level (or bottom-level; not sure which).
So, if I believe that interrupting the structure of EnvMBEs is indeed a nightmare to implement in alpha.rs, and I don't want to finish implementing Sky/Asterism right now, the only remaining option is to add repetition to EnvMBE. I think it'll look like this:
- Implement
VecR(or whatever name I come up with), in which each element might be a repetition (with an associated set of driving names). Have.to_vec()and.to_vec_or_panic()(and later come up with a way to march two of them together, and also a corresponding iterator-like type, actually, probably just an iterator over an enum) - In
EnvMBE, changerepeats: Vec<Rc<Vec<EnvMBE<T>>>>torepeats: Vec<Rc<VecR<EnvMBE<T>>>>, and push the interface change out to.get_rep()and.march(). (Eventually,Asterismwould beVecR-backed instead ofVec-backed; very little change there. - Push through to
.march()and.get_rep_res()in ast_walk.rs. Once supported inLiteralLikewalks, it can be then added on a per-form basis to make things work.
Doesn't sound too hard, ... he said in the twelfth message talking to himself in this thread..
Here's what an n-ary language implementation ought to look like (I haven't kept this up-to-date with build_a_language.unseemly, thought)