dhall-lang icon indicating copy to clipboard operation
dhall-lang copied to clipboard

Have a hash built-in function?

Open arianvp opened this issue 6 years ago • 23 comments

I sometimes find myself needing the hash of a dhall expression in Kubernetes land. It would be nice to have this as a built in. Would this be possible? Given we already know how to calculate the hash of an expression for integrity checking / pinning it seems like a small addition.

arianvp avatar Sep 11 '19 09:09 arianvp

Usecase:

In kubernetes you have Deployments that refer to ConfigMaps by name. But it's convenient to have the hash of the ConfigMap as an annotation on the Deployment because then changing the Deployment triggers a redeploy of the Deployment

Another solution to this would be if dhall-to-yaml had access to the integrity hash and could for example write it to a field when processing the dhall file

arianvp avatar Sep 11 '19 10:09 arianvp

I think this is possible, but with certain constraints:

  • It would need to be a keyword rather than a built-in
  • It would need to be resolved at import resolution time
  • The expression we are hashing would need to have the same restrictions as an imported expression (i.e. type-checked with an empty context, so no free variables)
  • The expression would need to be αβ-normalized before hashing (just like a normal integrity check)

In other words, it's basically like an integrity check except that we read what the hash is instead of asserting what it should be.

The main rationale for these restrictions is to ensure that we don't violate parametricity.

Gabriella439 avatar Sep 11 '19 15:09 Gabriella439

Cool, howabout would I go implementing this? Do I need to update the dhall-lang semantics first, or can I mess around in dhall-haskell first?

arianvp avatar Sep 12 '19 12:09 arianvp

@Gabriel439 @arianvp how about making this another behaviour of the import system?

Like ./some/import as Hash

f-f avatar Sep 12 '19 13:09 f-f

@f-f: Yeah, I think that would be a great idea!

Gabriella439 avatar Sep 13 '19 03:09 Gabriella439

Cool! Then @arianvp you might want to take a look at #585 for how to standardise this - in that case I implemented it in the Haskell side before changing the standard

f-f avatar Sep 13 '19 21:09 f-f

~I'm going to implement this in dhall-haskell (and standardize too, I guess).~

neongreen avatar Nov 16 '19 10:11 neongreen

Oh, looks like I misinterpreted the proposal. I want to be able to hash the text contents of a file, and it looks like I won't be able to do anything like (import ../Foo.txt as Text) as Hash with this approach.

neongreen avatar Nov 16 '19 10:11 neongreen

@neongreen it should work if you have the imports in different files? I.e. ./a.dhall would be import ./Foo.txt as Text and ./b.dhall would be import ./a.dhall as Hash

f-f avatar Nov 16 '19 14:11 f-f

@f-f Eh. It works, but isn't intuitive and requires an extra file per Foo.txt.

I'm not sure what is the problem with a hash built-in that we need to use as Hash instead. I guess the only major problem is that hash will become a keyword.

neongreen avatar Nov 16 '19 19:11 neongreen

@neongreen: It doesn't necessary need to be an import modifier (i.e. as Hash) and could just be an ordinary hash keyword, but it probably would still need to satisfy these constraints:

https://github.com/dhall-lang/dhall-lang/issues/739#issuecomment-530443513

Gabriella439 avatar Nov 16 '19 20:11 Gabriella439

"It would need to be resolved at import resolution time" – oh, I guess that allowing hash for arbitrary expressions (e.g. lambdas in particular) would indeed break parametricity unless you made sure you couldn't do something like \(s : X) -> hash s.

In which case, alright, let's go with as Hash. Seems less surprising this way.

neongreen avatar Nov 16 '19 21:11 neongreen

@neongreen, @Gabriel439 ... when I was looking to solve ~same problem (adding hash to my configs,so that Kubernetes knows when things change), I assumed that the proposed ./some/import as Hash would just resolve as a string of semantic-hash of the import in question.

Is that the case? Is somebody working on that?

I have to admit I am a bit lost in the keyword/import-modifier discussion.

AdamSaleh avatar Jan 17 '20 14:01 AdamSaleh

@AdamSaleh: I am not currently working on this. My current priority is https://github.com/dhall-lang/dhall-lang/issues/754

Gabriella439 avatar Jan 17 '20 15:01 Gabriella439

It would need to be a keyword rather than a built-in It would need to be resolved at import resolution time The expression we are hashing would need to have the same restrictions as an imported expression (i.e. type-checked with an empty context, so no free variables) The expression would need to be αβ-normalized before hashing (just like a normal integrity check)

These constraints sound pretty limiting. What would be nice is to be able to use a hash without having to use an import. i.e.:

let k8s =
      https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/49fabcc9a18b65aba2aeb8acee6a97f40f1a3fd3/1.14/package.dhall

let makeApp
    : Text → List k8s.Resource
    = λ(name : Text) →
        let configMap =
              k8s.Resource.ConfigMap
                k8s.ConfigMap::{
                , metadata = k8s.ObjectMeta::{ name = Some name }
                , data = Some (toMap { some = "${name} data" })
                }

        let deploy =
              k8s.Resource.Deployment
                k8s.Deployment::{
                , metadata = k8s.ObjectMeta::{
                  , name = Some name
                  , annotations = Some (toMap { configHash = hash configMap })
                  }
                }

        in  [ deploy, configMap ]

in  makeApp

rather than using a semantic hash, would it make more sense to rely on a potential toJSON keyword and a md5 function like md5 (toJSON configMap)? This would be similar to how it might be done in jsonnet: https://github.com/grafana/jsonnet-libs/blob/0f9b4bbce38addc30035e748b50b5fe41cac468c/ksonnet-util/kausal.libsonnet#L343

muff1nman avatar Aug 14 '20 19:08 muff1nman

@muff1nman: Yeah, factoring things through a JSON representation would work well! See also: #1027 and #336.

Gabriella439 avatar Aug 14 '20 21:08 Gabriella439

We are looking at doing something very similar to the use case you were proposing here: https://github.com/dhall-lang/dhall-lang/issues/739#issuecomment-674246482 @muff1nman Did you find any workaround to achieve that until we can leverage a potential hashing function? Thanks.

damdo avatar Nov 03 '20 17:11 damdo

Unfortunately no. I still think this would be an extremely useful feature to have. Unfortunately didnt have the time to work on a spec for the standard yet.

I still think we should definitely add this. As it is one of the few boxes dhall-kubernetes doesn't tick yet that kustomize does tick

arianvp avatar Nov 08 '20 10:11 arianvp

How can Dhall support hashes if it doesn't support binary data? Does Dhall not also need to support hexToBase32, base32ToBase64 Text/toUpper, Text/toLower? Also, Md5 is broken and should never be mentioned anywhere.

ehmry avatar Nov 12 '20 10:11 ehmry

we're talking about semantic hashes; which dhall already has. For each expression dhall calculates a semantic hash of that expresion over its CBOR representation. dhall freeze exposes this. This ticket is about exposing that semantic hashing to the language itself.

arianvp avatar Nov 12 '20 11:11 arianvp

The rough sketch I had in mind for this was:

  • Standardize https://github.com/dhall-lang/dhall-lang/issues/1027 (language support for a JSON type)

    This would necessarily include standardizing how to encode values of that type

  • Optional: Standardize https://github.com/dhall-lang/dhall-lang/issues/336 (toJSON keyword)

  • Standardize language support for Bytes

    This type would not be for computing the hash, but rather to store the digest. We don't want the digest to be Text since we don't know in advance what base system the user might want (e.g. base 16, base 64, etc.)

  • Standardize a JSON/hash built-in of type JSON → Bytes

  • Standardize a Bytes/toBase16 : Bytes → Text built-in (Related: https://github.com/dhall-lang/dhall-lang/issues/1092)

So then in most cases you would hash a Dhall expression by doing something like:

Bytes/toBase16 (JSON/hash (toJSON expression))

… assuming that expression is convertable to JSON

Gabriella439 avatar Nov 12 '20 16:11 Gabriella439

In some situations, having the hash match the dhall cache might be desirable (as per #1280). Could a faster, and more useful, path to this include something like toCbor?

Also, purely of my own ignorance, would someone be able to explain how \(x : T) -> hash x (T : Kind -> Bytes) breaks parametricity, where \(x : Json) -> JSON/hash x (Json -> Bytes) does not?

jcdickinson avatar Feb 25 '22 13:02 jcdickinson

@jcdickinson: Yeah, toCBOR would also work for the purpose of ensuring exact compatibility with the semantic hash

Any function of type ∀(a : Type) → a → Text or ∀(a : Type) → a → Bytes would break parametricity, especially if we permitted Text equality.

For example, normally there are only two functions that inhabit the type ∀(a : Type) → a → a → a, which are:

λ(a : Type) → λ(x : a) → λ(y : a) → x

λ(a : Type) → λ(x : a) → λ(y : a) → y

However, if you were to add a hash : ∀(a : Type) → a → Text built-in and a Text/equals : Text → Text → Bool built-in, then you could implement that same type in different ways like:

λ(a : Type) → λ(x : a) → λ(y : a) → if Text/equals (hash a x) "… hash of the expression 1 …" then x else y

… and now you have a function that can test to see if a polymorphic value is 1, even though the polymorphism should forbid the function from knowing anything about the input expression's type or value. In general, the purpose behind polymorphism is to ensure that a polymorphic input is treated as a black box in order for the type to provide certain stronger guarantees.

Gabriella439 avatar Feb 25 '22 15:02 Gabriella439