dhall-lang
dhall-lang copied to clipboard
Support for Blob types?
Perhaps I'm misunderstanding or misusing this language but I would like to be able to consume a value directly into a Haskell Bytestring. My first idea was to build a decoder that consumed a text value limited to an even number of characters, but I figured since that wasn't already in the library (given it seems like it would be common), I'm guessing that there was a reason to not do this. What is the appropriate way to use non-utf8 values in a dhall config?
AFAIK there is none at the moment. You might be interested in the discussion of #1092. #1121 is somehow related as well.
Let's call this type of value Bytes. I think the main thing that would help move this forward is deciding on the syntax for Bytes literals.
My first suggestion would be to use hexadecimal encodings since they are byte aligned and are the required utf8 encoding needed to be in dhall configs.
\xDEADBEEF
or something like that. Is it doable in the syntax tree or are there going to be conflicts?
Here are two possible alternatives that I might suggest for how to encode Bytes literals:
-
Use a syntax similar to the hexadecimal notation that we permit for
Naturalnumbers, but with quotes, like this:0x"FEED" -
Don't provide syntax for
Bytesliterals at all, and instead provide abytesbuiltin of the following typebytes : Natural → Bytes… which could be used like this:
bytes 0xFEED
cc: @blbarker, since #1215 reminded me of this
@Gabriel439 I think I prefer the first option for the following reason: With the 0x" prefix it is immediately clear that we deal with a blob whereas in the second alternative this is not known until we feed the natural to bytes. Consider the following code:
let blob : Natural = ./data.dhall
in
bytes blob
During import resolution the implementation will choose a numeric type to hold the content of ./data.dhall just to discover later that this was never meant to be used in this context and it could have chosen a byte array to store the data instead. I am not sure if that really makes a difference though.
On the other hand, it might be more comprehensible for the developer as well.
Another option that comes to my mind is base16:FEED (or b16:FEED) similar to the sha256:FEED hash sums we already use in our integrity checks. That way we might be able to extend the blob type with other encodings if that is requested at some point at maintain consistency within the language.
@mmhat: I think the main reason I'd prefer quotations instead of a base16: prefix is that if we add other types of numeric literal notations then we can also turn them into valid Bytes literals in the same way by adding quotes
For example, suppose that we standardize support for bitwise numeric literals as suggested in #1215 (e.g. 0b10111000), then we could also make that valid notation for Bytes literals by adding quotes (e.g. 0b"10111000")
Also, I just realized a reason why a bytes : Natural → Bytes built-in wouldn't work, because we'd have no way to represent Bytes with leading zeros, because 0x00FF from 0xFF represent the same Natural number
@Gabriel439 Interestingly when I wrote about extending the blob type I was thinking more about base32: or base64: encodings rather than binary or octal representations and I have no idea how we would add those using the prefix system for numeric literals.
Apart from that I really like that the quotations reflects the close relationship to Text values.
I think the quoted version seems good. From my non-PL-designer POV, I don't have a strong preference for any particular implementation as long as it isn't overly burdensome. And as we all know, it can be quite easy to bikeshed over trivial stuff. If there's no obvious reason not to go with quotations I think we should just do that and call it a day.
I also have a few interesting use cases to throw out there for a Bytes type, where files can be imported as such.
One case is for generating powerpoint files (I'm using Dhall basically anywhere and everywhere I can these days!), which is just a zip of a bunch of XML, for the most part. Using dhall to-directory-tree is awesome here, as it does all the things I would otherwise have to do manually here. The only thing left at this point is to copy resources like images and fonts into the directory before zipping. It would be especially cool if I could just import the images in dhall and plop them right into the director tree output so they were scooped up automatically. That would also mean some references could be made explicit, rather than implicit.
Another case would be for something like CloudFormation for AWS Lambda. A zip file of the code is essentially the deployable unit. In some configurations, it's helpful to have a SHA256 of the zip in the CF itself. If the zip could be imported as bytes and then sha256 was a function from and to Bytes that could be rendered out to Text that would again eliminate otherwise custom build steps in some cases.
Hope these use-cases are helpful!