unison
unison copied to clipboard
standardize representation for hashing (decouple from codebase representation)
We have hashes computed in different places — sometimes in the codebase implementation and sometimes outside of it. We should pick one, IMO passed in, with its own independent representation that's locked to "hashing" version. (related to #2140)
Discussed this, decided the following:
- We're going to amend the codebase interface so the codebase is not responsible for doing ANY hashing. Currently, the codebase is doing some type hashing in
putTermto populate the type search index. It should not be responsible for that. Not sure what exactly the new signature forputTermshould be but it seems doable. Suggestion:putTermImplis the codebase interface function, and thenCodebase.putTermis a helper function that callsputTermImplwith the type search hashing info.- Any implementation of hashing can just be removed from the codebase package.
- The hashes will be different after pulling out the cycle length, which is fine. Will just use the old hashes as IDs until someone changes their implementation.
- May result in some spurious updates even if a definition is unchanged, but this is pretty rare (if you are
edit-ing a definition it's probably to... actually make changes to it) - We should bump the hash version number used for storing new hashes in the database.
- We have this other planned hash update https://github.com/unisonweb/unison/issues/2276#issuecomment-889317543 so we should roll both these hash updates into the same release
- May result in some spurious updates even if a definition is unchanged, but this is pretty rare (if you are
The hashes will be different after pulling out the cycle length
I think for this to work without the constructor problems we discussed on Friday, and without rehashing the whole codebase (I don't think it's necessary or desirable yet), the hashes won't(/can't) be different.
The hashing code can just convert definitions to a v1 Hashable data type (identical to the one we've been using thus far), and produce an unchanged result. i.e. To avoid footguns, we'll be very selective about what types get Hashable instances, and never alter those types or instances except in a provably backwards-compatible way.
The hash scheme only has to change if the v1 Hashable data type doesn't capture all the data we want in the hash (e.g. #2276), which doesn't have to be coupled to this or #2140.