unison standardize representation for hashing (decouple from codebase representation)

standardize representation for hashing (decouple from codebase representation)

Open aryairani opened this issue 4 years ago • 2 comments

We have hashes computed in different places — sometimes in the codebase implementation and sometimes outside of it. We should pick one, IMO passed in, with its own independent representation that's locked to "hashing" version. (related to #2140)

Sep 01 '21 01:09 aryairani

Discussed this, decided the following:

We're going to amend the codebase interface so the codebase is not responsible for doing ANY hashing. Currently, the codebase is doing some type hashing in putTerm to populate the type search index. It should not be responsible for that. Not sure what exactly the new signature for putTerm should be but it seems doable. Suggestion: putTermImpl is the codebase interface function, and then Codebase.putTerm is a helper function that calls putTermImpl with the type search hashing info.
- Any implementation of hashing can just be removed from the codebase package.
The hashes will be different after pulling out the cycle length, which is fine. Will just use the old hashes as IDs until someone changes their implementation.
- May result in some spurious updates even if a definition is unchanged, but this is pretty rare (if you are edit-ing a definition it's probably to... actually make changes to it)
- We should bump the hash version number used for storing new hashes in the database.
- We have this other planned hash update https://github.com/unisonweb/unison/issues/2276#issuecomment-889317543 so we should roll both these hash updates into the same release

Sep 02 '21 14:09 pchiusano

The hashes will be different after pulling out the cycle length

I think for this to work without the constructor problems we discussed on Friday, and without rehashing the whole codebase (I don't think it's necessary or desirable yet), the hashes won't(/can't) be different.

The hashing code can just convert definitions to a v1 Hashable data type (identical to the one we've been using thus far), and produce an unchanged result. i.e. To avoid footguns, we'll be very selective about what types get Hashable instances, and never alter those types or instances except in a provably backwards-compatible way.

The hash scheme only has to change if the v1 Hashable data type doesn't capture all the data we want in the hash (e.g. #2276), which doesn't have to be coupled to this or #2140.

Sep 06 '21 14:09 aryairani

unison unison copied to clipboard

standardize representation for hashing (decouple from codebase representation)

unison
unison copied to clipboard