Have a hash built-in function?
I sometimes find myself needing the hash of a dhall expression in Kubernetes land. It would be nice to have this as a built in. Would this be possible? Given we already know how to calculate the hash of an expression for integrity checking / pinning it seems like a small addition.
Usecase:
In kubernetes you have Deployments that refer to ConfigMaps by name. But it's convenient to have the hash of the ConfigMap as an annotation on the Deployment because then changing the Deployment triggers a redeploy of the Deployment
Another solution to this would be if dhall-to-yaml had access to the integrity hash and could for example write it to a field when processing the dhall file
I think this is possible, but with certain constraints:
- It would need to be a keyword rather than a built-in
- It would need to be resolved at import resolution time
- The expression we are hashing would need to have the same restrictions as an imported expression (i.e. type-checked with an empty context, so no free variables)
- The expression would need to be αβ-normalized before hashing (just like a normal integrity check)
In other words, it's basically like an integrity check except that we read what the hash is instead of asserting what it should be.
The main rationale for these restrictions is to ensure that we don't violate parametricity.
Cool, howabout would I go implementing this? Do I need to update the dhall-lang semantics first, or can I mess around in dhall-haskell first?
@Gabriel439 @arianvp how about making this another behaviour of the import system?
Like ./some/import as Hash
@f-f: Yeah, I think that would be a great idea!
Cool! Then @arianvp you might want to take a look at #585 for how to standardise this - in that case I implemented it in the Haskell side before changing the standard
~I'm going to implement this in dhall-haskell (and standardize too, I guess).~
Oh, looks like I misinterpreted the proposal. I want to be able to hash the text contents of a file, and it looks like I won't be able to do anything like (import ../Foo.txt as Text) as Hash with this approach.
@neongreen it should work if you have the imports in different files? I.e. ./a.dhall would be import ./Foo.txt as Text and ./b.dhall would be import ./a.dhall as Hash
@f-f Eh. It works, but isn't intuitive and requires an extra file per Foo.txt.
I'm not sure what is the problem with a hash built-in that we need to use as Hash instead. I guess the only major problem is that hash will become a keyword.
@neongreen: It doesn't necessary need to be an import modifier (i.e. as Hash) and could just be an ordinary hash keyword, but it probably would still need to satisfy these constraints:
https://github.com/dhall-lang/dhall-lang/issues/739#issuecomment-530443513
"It would need to be resolved at import resolution time" – oh, I guess that allowing hash for arbitrary expressions (e.g. lambdas in particular) would indeed break parametricity unless you made sure you couldn't do something like \(s : X) -> hash s.
In which case, alright, let's go with as Hash. Seems less surprising this way.
@neongreen, @Gabriel439 ... when I was looking to solve ~same problem (adding hash to my configs,so that Kubernetes knows when things change), I assumed that the proposed ./some/import as Hash would just resolve as a string of semantic-hash of the import in question.
Is that the case? Is somebody working on that?
I have to admit I am a bit lost in the keyword/import-modifier discussion.
@AdamSaleh: I am not currently working on this. My current priority is https://github.com/dhall-lang/dhall-lang/issues/754
It would need to be a keyword rather than a built-in It would need to be resolved at import resolution time The expression we are hashing would need to have the same restrictions as an imported expression (i.e. type-checked with an empty context, so no free variables) The expression would need to be αβ-normalized before hashing (just like a normal integrity check)
These constraints sound pretty limiting. What would be nice is to be able to use a hash without having to use an import. i.e.:
let k8s =
https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/49fabcc9a18b65aba2aeb8acee6a97f40f1a3fd3/1.14/package.dhall
let makeApp
: Text → List k8s.Resource
= λ(name : Text) →
let configMap =
k8s.Resource.ConfigMap
k8s.ConfigMap::{
, metadata = k8s.ObjectMeta::{ name = Some name }
, data = Some (toMap { some = "${name} data" })
}
let deploy =
k8s.Resource.Deployment
k8s.Deployment::{
, metadata = k8s.ObjectMeta::{
, name = Some name
, annotations = Some (toMap { configHash = hash configMap })
}
}
in [ deploy, configMap ]
in makeApp
rather than using a semantic hash, would it make more sense to rely on a potential toJSON keyword and a md5 function like md5 (toJSON configMap)? This would be similar to how it might be done in jsonnet: https://github.com/grafana/jsonnet-libs/blob/0f9b4bbce38addc30035e748b50b5fe41cac468c/ksonnet-util/kausal.libsonnet#L343
@muff1nman: Yeah, factoring things through a JSON representation would work well! See also: #1027 and #336.
We are looking at doing something very similar to the use case you were proposing here: https://github.com/dhall-lang/dhall-lang/issues/739#issuecomment-674246482 @muff1nman Did you find any workaround to achieve that until we can leverage a potential hashing function? Thanks.
Unfortunately no. I still think this would be an extremely useful feature to have. Unfortunately didnt have the time to work on a spec for the standard yet.
I still think we should definitely add this. As it is one of the few boxes dhall-kubernetes doesn't tick yet that kustomize does tick
How can Dhall support hashes if it doesn't support binary data? Does Dhall not also need to support hexToBase32, base32ToBase64 Text/toUpper, Text/toLower? Also, Md5 is broken and should never be mentioned anywhere.
we're talking about semantic hashes; which dhall already has. For each expression dhall calculates a semantic hash of that expresion over its CBOR representation. dhall freeze exposes this. This ticket is about exposing that semantic hashing to the language itself.
The rough sketch I had in mind for this was:
-
Standardize https://github.com/dhall-lang/dhall-lang/issues/1027 (language support for a
JSONtype)This would necessarily include standardizing how to encode values of that type
-
Optional: Standardize https://github.com/dhall-lang/dhall-lang/issues/336 (
toJSONkeyword) -
Standardize language support for
BytesThis type would not be for computing the hash, but rather to store the digest. We don't want the digest to be
Textsince we don't know in advance what base system the user might want (e.g. base 16, base 64, etc.) -
Standardize a
JSON/hashbuilt-in of typeJSON → Bytes -
Standardize a
Bytes/toBase16 : Bytes → Textbuilt-in (Related: https://github.com/dhall-lang/dhall-lang/issues/1092)
So then in most cases you would hash a Dhall expression by doing something like:
Bytes/toBase16 (JSON/hash (toJSON expression))
… assuming that expression is convertable to JSON
In some situations, having the hash match the dhall cache might be desirable (as per #1280). Could a faster, and more useful, path to this include something like toCbor?
Also, purely of my own ignorance, would someone be able to explain how \(x : T) -> hash x (T : Kind -> Bytes) breaks parametricity, where \(x : Json) -> JSON/hash x (Json -> Bytes) does not?
@jcdickinson: Yeah, toCBOR would also work for the purpose of ensuring exact compatibility with the semantic hash
Any function of type ∀(a : Type) → a → Text or ∀(a : Type) → a → Bytes would break parametricity, especially if we permitted Text equality.
For example, normally there are only two functions that inhabit the type ∀(a : Type) → a → a → a, which are:
λ(a : Type) → λ(x : a) → λ(y : a) → x
λ(a : Type) → λ(x : a) → λ(y : a) → y
However, if you were to add a hash : ∀(a : Type) → a → Text built-in and a Text/equals : Text → Text → Bool built-in, then you could implement that same type in different ways like:
λ(a : Type) → λ(x : a) → λ(y : a) → if Text/equals (hash a x) "… hash of the expression 1 …" then x else y
… and now you have a function that can test to see if a polymorphic value is 1, even though the polymorphism should forbid the function from knowing anything about the input expression's type or value. In general, the purpose behind polymorphism is to ensure that a polymorphic input is treated as a black box in order for the type to provide certain stronger guarantees.