unison
unison copied to clipboard
Fix `unique` type update churn
A unique type definition not deleted from a scratch file will always be treated as an update (as opposed to an existing/known definition) after an add
. See #2195.
@pchiusano Could you take a look at what it would take to achieve this?
Option 1
The simplest fix for this: the pretty-printer just adds the GUID to the declaration it generates when a definition is dumped to a file.
unique[someguid] type Direction = North | South | East | West
This is accurate (the declaration has this info available) but ugly. I don't know that we want to expose GUIDs like this as part of the programmer interface.
Option 2
If we don't want to do that, doing a lookup of "the structure matches" would require that we have some index of decl structure to decl. Then in theory we could lookup the decl structure, and if there's a unique match, we replace that definition with its canonical version.
What if there are multiple unique
types that share a structure though? (This is likely, for instance, there are many newtypes around Text
) Then I suppose you can use some heuristic, checking for names that happen to be the same? There's also the question of where we store this index. Seems like it ought to be the codebase.
Overall, this one seems like a can of worms.
Option 3
Another option that is growing on me: update
doesn't update unique
types: unique types cannot be updated via punning on the names coinciding. Instead, the user has to create a fresh type, with a different fully qualified name, and use the replace
command to explicitly replace the old type with the new type.
This option seems pretty clean. Rather than doing something silly, or taking a guess at the user's intent, just force the user to be explicit. It might be kind of an annoying DX, but perhaps it could be nicely incorporated into a revamped update workflow. https://github.com/unisonweb/unison/issues/2248
Wrt. Option 2 — we can call this 2b — I meant:
When we do an update of unique type Direction
, we look up the existing decl Direction
(no new index is needed, just the existing index by name), and from there compare the structure of the existing data decl named Direction
in the namespace with the structure of the updated data decl called Direction
in the scratch file. (All update
s are by definition already predicated on a names coinciding, so there's nothing particularly new here, other than how we compare the old and new decls for equality.)
Does this make more sense? Are there still worms?
For Option 1, I agree it's unpalatable to expose the GUIDs, even if it is the simplest solution. For Option 3, I think the DX will be way too painful.
When we do an update of unique type Direction, we look up the existing decl Direction (no new index is needed, just the existing index by name), and from there compare the structure of the existing data decl named Direction in the namespace with the structure of the updated data decl called Direction in the scratch file. (All updates are by definition already predicated on a names coinciding, so there's nothing particularly new here, other than how we compare the old and new decls for equality.) Does this make more sense? Are there still worms?
Ah, cool. That makes sense. It's good that this doesn't need any new index. I'm sold that this is the nicest DX so let's try to make it work.
I think it might be kind of tricky updating the file parsing / hashing code, but it seems doable?
Just thinking about it..
Currently, we compute hashes for all the types in the file in one go, taking into account cycles. That will need tweaking.
What if instead:
- For each decl in the file, look up the decl with the corresponding FQN. (This requires having access to the codebase within the parsing / hashing code, or some preprocessing pass to figure out what you'll need to load up front. Possible subtlety - is that FQN already shadowed? Should it look up in old names?)
- If a
unique
decl has a FQN the corresponds to a pre-existingunique
decl, we should only use that decl's GUID if the structure is the same. How do we determine that, in the presence of cycles and constructors being in different orders? - Maybe do one hashing pass that is "structure only", converting all types (including the pre-existing same-named
unique
types) tostructural
. Now you have a structural hash for all decls involved, and you can figure out if two structures are the same. If they are, and the FQNs match, then you can inherit the GUID from the pre-existing definition.
I may be misunderstanding or delirious, but I think it can be even simpler than what you described.
WDYT:
In HandleInput.hs
-> UpdateI
, parsing/hashing has already occurred (i.e. nothing new is needed there), we're passed a TypecheckedUnisonFile
. We currently have a pure function toSlurpResult
which we call here, and which is the bulk of brains of update
. This can be the piece that becomes monadic and loads Decls from the codebase, and I don't think it needs any pre-processing.
- If a unique decl has a FQN the corresponds to a pre-existing unique decl, we should only use that decl's GUID if the structure is the same. How do we determine that, in the presence of cycles and constructors being in different orders?
Won't the cycles and constructors be in the same order? Particularly if it's type-checked source that we just pretty-printed out of the codebase? Aren't the cycles numbering and constructor orders just dependent on the structure of the Decl? or are they dependent on the GUID too?
In HandleInput.hs -> UpdateI, parsing/hashing has already occurred (i.e. nothing new is needed there), we're passed a TypecheckedUnisonFile. We currently have a pure function toSlurpResult which we call here, and which is the bulk of brains of update. This can be the piece that becomes monadic and loads Decls from the codebase, and I don't think it needs any pre-processing.
Okay yeah that sounds good! I didn't think of making the change in that spot, was thinking it had to be done earlier, when the hashes in the file are being computed. But you are saying you can leave that alone and just detect it later, before building the update.
I was still thinking of using hashing for determining if two structures are the same, rather than trying to like, align the constructors, which seems easy to get wrong. The hashing will correctly deal with the situation where, like, a unique
type is part of a cycle and one of the cycle elements has a different structure (so therefore, all cycle elements have a different structure). And maybe other corner cases I haven't thought of.
Aren't the cycles numbering and constructor orders just dependent on the structure of the Decl? or are they dependent on the GUID too?
If I have:
unique type Foo = Blah | Blah2 Nat
unique ability Woot where
x : Foo -> Foo -> Nat
y : Nat
Then Woot
's order of constructors depends on the GUID for Foo
(it will canonicalize Woot
's constructor order based on the hashes of the constructor types - and those hashes incorporate the GUID for Foo
).
The constructor orders have been canonicalized by the time the file is typechecked.
So in general, the constructor orders are not going to match between the old and new version of the type. You need to rehash just the structures or using the same GUIDs to get things in the same canonical order.
Hmm, I see; bummer.
Going back to the drawing board, I guess the problem is that ucm sort of needs to know in advance of type-checking why the various definitions are in the scratch file. Is the definition there because the user wants to use and check and add it? Or is it there because the user wanted to have a copy visible for reference in the scratch file? Or is it there because the user hasn't gotten around to deleting it yet?
With a GUI, I could imagine keeping this info around as hidden client-side state (maybe a variation on Option 1 would work), but all we have at the moment for client-side state is the contents of the scratch file, and the whatever is in HandleInput.LoopState
.
I was going to suggest an Option 3b where update
doesn't update unique
types unless the user specifies them explicitly, as in: .tmp> update Direction
. But if we have other definitions in the scratch file that depend on Direction
, what GUID will they be using? Now we have a mess. I forgot to consider this for Option 2b too; we do need to know our GUIDs before type-checking dependents of unique types.
🤔
Definitely typing nonsense at this point — will take another look again later.
I think your idea still works, it's just that the "determine if these two structures are the same" calculation needs to use the algorithm I suggested - unhash the decls in the file, rehash them as "structure only", and also do the same for definitions in the codebase whose names coincide with definitions in the file. Now you have a "structure hash" which has the same meaning for both old and new, which you can use to skip updates for unique
types whose structure hasn't changed from the same-named definition in the codebase.
Tangentially related: today we saw a confusing thing here:
```unison
unique ability Foo where
Op0 : Nat -> ()
```
```ucm
.temp> add
```
```unison
unique ability Foo where
Op1 : Nat -> ()
```
```ucm
.temp> update
```
Op0
disappears; possibly expected after <IOU ticket reference>.
@runarorama suggested the following fix, which requires no codebase format changes.
When typechecking a file, we get a list of all the unique types. We then check the namespace for any types with the same names.
We load these types from the codebase, and hash them as structural types. We compare this structural hash to the structural hash of the type in the file. If it's the same, we use the same GUID for the types in the file.
Boom. This achieves the property that if the type has the same structure and the same name, it gets the same guid and hash.
@aryairani
Update: this doesn't quite work because guid affects constructor order which in turn affects the final hash. But what you can do is just try copying the guid from the types in the namespace to the types in the scratch file. Then rehash the type in the scratch file and see if it matches. This has the same effect.
To anyone else who keeps getting bitten by this, here is an awk script that comments out all of the unique type declarations in a scratch file.
Restating the suggestion:
- When typechecking, we get a list of all the unique types in the file. Call one of these
tf
. - For each
tf
, check the namespace for a type with the same names. Call thistn
. - Load
tn
from the codebase for itsunique
guidgtn
.. - Build
tf' = { tf | guid = gtn }
. - If hash
tf'
= hashtn
, then replacetf
withtf'
in theUnisonFile
; otherwise leavetf
alone.
What if there's more than one tn
for a tf
? Putting aside "update considered harmful", let's:
If there's any tn
that matches, just use that corresponding tf'
for typechecking. Maybe in the future we could ask the user, or use some other heuristic to decide what to do.
What if there's more than one tn for a tf? Putting aside "update considered harmful", let's: If there's any tn that matches, just use that corresponding tf' for typechecking. Maybe in the future we could ask the user, or use some other heuristic to decide what to do.
Do you mean pick an arbitrary tn
? That is, say the user already has in their namespace
name | structural hash | guid |
---|---|---|
Foo | #abc | 123 |
Foo | #abc | 456 |
Foo | #def | 789 |
and they save a .u
file with a unique type Foo
defined that looks like
name | structural hash | guid |
---|---|---|
Foo | #abc |
then we have two different-but-equally-valid GUIDs to reuse, so we pick one, and proceed to typecheck?
name | structural hash | guid |
---|---|---|
Foo | #abc | arbitrarily pick 123 or 456 |
Yeah, I think that makes as much sense as anything. One of them will look stable and the other one will look like churn, but we'll be reducing churn, and in most cases eliminating it.
In the future we may want to do something special when update
ing when there's a conflict, and that something special will be applicable here too.