Discussion: Support for importing from JSON
One pretty heavily requested feature is importing JSON values directly into Dhall. The most common requested reasons for doing this are:
- Interop with existing JSON infrastructure (i.e. reusing shared JSON configuration files or API endpoints between tools)
- Taking advantage of Dhall's declarative import system to orchestrate tying together multiple heterogeneous inputs
I'm open to the idea although I probably won't implement it until the import semantics are standardized (any day now :) ). In the meantime, though, I can still gather feedback on whether Dhall should support this feature and to flesh out what it might look like if it were proposed.
This would most likely be similar in spirit to Dhall's existing support for importing raw text using as Text. In other words, you would be able to write ./foo.json as JSON to import a JSON file into Dhall), however there are still some open issues that need to be resolved.
For those who are in favor of this feature, the easiest way to drive this discussion is to discuss how Dhall should behave when importing the following JSON expressions which cover most of the corner cases that I'm aware of:
[ 1, null ]
[ 1, true ]
[ { "x": 1 }, { "x":1, "y":true } ]
For each of these imports, should Dhall:
- reject the import?
- accept the import with a type annotation (i.e. a sum type or
Optionaltype)?- if so, what type annotation(s) would allow the import to succeed?
- both of the above (i.e. be strict without a type annotation and more lenient with a type annotation)?
My humble opinion about this is that Dhall should always require a type annotation, regardless of how 'guessable' the imported type is. The rationale is that even though your list stores only ints now it may end up with booleans later and it feels inappropriate to implicitly guess when importing files. Better would be to make dhall-guess-the-type-of-this-json-bit for this purpose
Also, I think that only implicit conversion that is sensible is to equate null & missing values to optional.
Thus [ 1, null ] could be importable using List (Optional Integer) and [ { "x": 1 }, { "x":1, "y":true } ] as List {x:Integer,y:Optional Bool} while [ 1, true ] would never be importable (atleast until dhall grows enough dt for the type to contain an interpretation function).
I don’t think the idea is to guess types. From #326:
./foo.json as JSON : List { name : Text, age : Natural }
Well, since outlined options above are "reject" and "accept with type annotation" I thought that there would be a form where type annotation wasn't necessary ie. type would arise from imported data. Sorry about my confusion.
I think the last section was about how JSON should be typed in dhall by default (yet still with deterministic rules what gets which type).
For example the aforementioned
./foo.json as JSON : List { name : Text, age : Natural }
would accept [ { "name": "me", "age": 23 } ] but not
[ { "name": "me", "age": 23, "occupation": "programmer" }, { "name": "Mel" } ], while just
./foo.json as JSON
could accept the latter and would type it as List { name: Text, age: Optional Natural, occupation: Optional String }.
@Gabriel439 I personally think dynamically adding optionals would only make sense if Dhall can infer the needed fields from usage, which it can’t. Since everywhere else types are not optional (and can’t be inferred) I think this would break consistency (and maybe bring up the expectation that it infers from usage).
Yeah, my personal preference is for a mandatory type signature, too. I just didn't want to bias the discussion at the very beginning. My reasoning is that it would be very weird for this:
[ 1 ]
... to have an inferred type of List Integer, whereas this:
[ 1, null ]
... has an inferred type of List (Optional Integer). Adding an element to a list shouldn't change its type and (like other people mentioned) wouldn't be consistent with other design decisions in Dhall.
However, there is still the question of whether or not Dhall should allow importing this JSON:
[ 1, true ]
... using a type annotation with a sum type like this:
./foo.json as JSON : List < Left : Integer, Right : Bool >
The main downside of that proposal that I'm aware of is that you have to specify what happens if you start nesting sum types or if you have sum types with multiple constructors that wrap the same type. My inclination is to still reject that, but I just wanted to mention it because dhall-to-json does support this in the opposite direction:
$ dhall-to-json
let Either = constructors < Left : Integer | Right : Bool >
in [ Either.Left 1, Either.Right True ]
<Ctrl-D>
[ 1, true ]
Well, I was referring to "typed by default" as guessing. I think there are two use cases for dhall-from-json:
-
To get some static piece of data easily into dhall. This is, in my mind best served by external conversion program, which you run just once.
-
You want to use different bits of json as input to a dhall program or the data you are importing changes sometimes. Now, in this case you probably want direct support in dhall. However, in this case, no single json snippet is going to be able to tell you what the exact shape of the (future) data will be, so the defaulting mechanism is probably not so hot idea.
As a final thought, how about adding import plugins (using similar scheme as in pandoc) to dhall? You would supply dhall with a program/script that can output dhall expressions and then import other bits of data through that script. For example, you could do something like:
echo './myCSV using CsvConverter {name:Text,Age:natural}' | dhall --plugin=CsvConverter
This would allow testing different JSON import schemes or interacting with other more task specific data sources. Successful data providing plugins could be merged to dhall after they've seen some real world use. (This could also handle things like https://github.com/dhall-lang/dhall-haskell/issues/292)
Yeah, I like the plugin idea, although I would prefer to do it through the Haskell API instead of the command line
Untagged unions should be different from sum types in my opinion.
I wouldn’t have expected dhall-to-json to throw the tags away to be honest.
Command line vs. Haskell API depends on who you wish to write plugins. I would guess that today most dhall is consumed by Haskell programs and the plugin is easiest and safest to add there.
However, if you use dhall from command line a lot then you'll need to build your own binary. Not a problem for Haskell users but probably a bit of a hurdle for the rest.
Keep in mind that the long term goal of Dhall is language bindings other than Haskell. So ideally there would be a language binding in that user's preferred language that they could use to customize the import resolution behavior.
The main reason I want to avoid a plugin API is that then I have to standardize the semantics and interface for plugins and every Dhall implementation would need to support that standard plugin semantics.
Note that in the long run I don't want users to have to use any binaries at all. The integration with their preferred language should be through a library rather than a subprocess call to some dhall or dhall-to-json executable.
In other words: I agree with the goal that users shouldn't have to build their own binaries, but I believe that the correct solution to that goal is to finish standardizing import semantics in order to create more language bindings rather than make the binaries of one implementation hyper-customizable.
I met an another case where having some kind of extended importing would be useful.
I'm using dhall to describe some course exercises. Now, some exercises are in want of bibliography links and all I have is a large bibtex file. In this case I converted the bibliography, partly and by hand, into dhall so I could import the required entries.
It would've been nice if I could've imported the .bib file directly. Doing the bib->dhall conversion means that the .bib file is no longer the primary data source and that I need to write a converter from dhall to bib to make use of the entries that I converted into dhall.
Perhaps extending the syntax so that import foo using <dhall-expression> is valid would be a start for doing something like this?
I've been pondering this for a little while and I feel that as JSON should be typed but that a tool should exist to take a corpus of example / expected payloads should be able to provide a type- to ease utilisation of as JSON.
The tool could also possibly take two corpus' that reflect both valid and error JSON responses so allow for a union type to cover both circumstances...
I've spent the afternoon making a toy json-to-dhall tool (https://gist.github.com/madjar/252c517644c0e13ef28a2a7ca71f5fa4). It's very prototypey code, and just supports most basic types, as well as optionals and dynamic maps (mapKey/mapValue).
The question is: if I want to transform this into something that's actually useful, where should it live:
- Some external project?
- As part of the dhall-json package?
- In some other form?
@madjar: We want to add this to the language standard and once it's there then it will live in all implementations of the standard using the as JSON keyword (i.e. in the dhall-haskell project, for example). The first step is to review your code and see if that matches how people expect the as JSON feature to behave. I will try to review your code more closely tonight.
The key thing to emphasize is that the standardization process and agreeing upon the desired behavior is the bottleneck here because once it is standardized then I expect it will be pretty straightforward to update the implementation to match.
@Gabriel439 If your review it closely, then I'll have to apologize for the quality. It was kind of rushed this afternoon. The approach I've take is the one describe in https://github.com/dhall-lang/dhall-json/issues/23#issuecomment-381382116, under "Convert and type check together", which is to recursively traverse both the json Value and the dhall Expr, accepting only values that exactly match the given type.
Having this tool made the conversion of a json file and the definition of its dhall type quite nice, allowing to incrementally add the missing parts to the type definition while having quick feedback.
But I understand that you see this not as tooling, but as part of the language, thus requiring more standardization than "whatever the tool does". I'll familiarize myself with the processes of the project, then.
Thanks!
My opinion on this is that this would be extremely cool.
Regarding type annotations, here is what I had in mind: importing without a type annotation should be ok, because when we write [ "a", "b" ] in dhall we don't need a type annotation. So requiring one to import a similar bit of JSON seem unnatural. However, without a type annotation the type checker would be very strict and disallow any kind of mixing of types. Essentially, it would parse the JSON as it would a similar dhall expression.
This has a nice side-effect that doing echo "./data.json as JSON" | dhall type would give a type for the json payload.
Now when type annotations are added, the data can be more flexible, for example transforming nulls into Nones etc as mentioned above. But I feel this can be left for a later stage, since it would be rather more complex.
I agree that a as JSON mechanism should take some kind of type definition as a parameter, instead of trying to magically generate a type from the parsed JSON. I think that such a notion is more "Dhall-ish", which is to say, that input should be type-checked instead of blindly trusted.
With that said, I don't think that the type inputs for as JSON should be statically defined, e.g. let Strings = List Text in ./data.json as (JSON Strings). I think that this inherently limits the value of an as JSON language feature due to the dynamic nature of much JSON output.
Consider, for instance, the dhall-terraform-output script which takes Terraform's JSON output and assembles both a type and a record from that output. Because the record keys are variable, it's not possible to define a Dhall type for arbitrary Terraform JSON output ahead of time (or rather, it is, but it would be fragile). However, this doesn't mean that Terraform's JSON output doesn't follow a predictable pattern, and ideally, upon parsing Terraform's JSON output, it would be best to verify that the JSON output fits that pattern, and possibly even get a type that fit the predicted pattern.
What's the best way to do that? I don't know. Maybe, instead of as (JSON Type), we have some kind of as (JSON (? -> Type)), the idea being that it takes a function that produces a type instead of a static type? I have a feeling that a solution would veer on dependent typing, which the language standard doesn't support (yet?), and that's a Pandora's box of its own.
Probably it would be best to start with as (JSON Type), be strict about accepted input, and do so with the explicit caveat that it's still a partial solution and doesn't expect to be universally useful for any kind of JSON input.
I think requiring type annotations will make it mostly impossible to import large JSON structures like CloudFormation data.
@alexanderkjeldaas: Wouldn't they also fail to import without type annotations? Usually those kinds of JSON files mix records of various types
@alexanderkjeldaas : Please, try the (new) json-to-dhall tool (in https://github.com/dhall-lang/dhall-haskell) and share your experience/issues. The tool requires a type annotation (schema) but does support union types and should be able to handle situations when different types are "mixed".
Json-to-dhall is great!
I'm not sure how 'done' it is considered to be? Does this unlock this issue, i.e. creating the syntax for the core language and command line dhall utility to import X as Json and import X as Yaml?
This would be great!
I think a good first step could be creating a proxy server or similar that converts json/yaml/toml/cbor/xml to Dhall and then you can import from URLs there.
@feliksik: The main thing I'm waiting for is to see how json-to-dhall evolves over the next few months based on feedback from users. For example, there was a recent change added to support unions for keys, which was a use case we didn't originally envision: https://github.com/dhall-lang/dhall-haskell/pull/1094
A separate idea that has been floated in some other issue (I don't remember which one), was to enable Text parsing utilities but only for the purpose of importing other file formats. In other words, you could write:
./example.json as ./jsonParser.dhall
... where ./jsonParser.dhall would have access to text parsing/manipulation primitives that are ordinarily not available so that you could write a JSON decoder for Dhall. This would allow us to add new supported formats without extending the language and it would push all the weakly-typed string logic to the edge of the system (i.e. only at the point of import).
... where ./jsonParser.dhall would have access to text parsing/manipulation primitives that are ordinarily not available so that you could write a JSON decoder for Dhall. This would allow us to add new supported formats without extending the language and it would push all the weakly-typed string logic to the edge of the system (i.e. only at the point of import)
How would such a decoder look like? What I mean here is that we'd need to support some kind of text manipulation in the standard in order to allow this right?
@f-f: I'm not sure. It's a very half-baked idea I had at one point, that's all 🙂
Thank you @Gabriel439 for your comment; I'm very eager for this feature, but it's indeed better to do this right than to do it fast.
@feliksik: I would actually be fine baking in language support for JSON specifically instead of waiting for a more general solution. Dhall's future as a language is already intimately tied to its ability to displace JSON/YAML, so I think it's fine to special-case support for JSON
Sounds good... But also support yaml, I'd suggest. This could even be limited support (e.g. fail on Date type, or other things for which the semantics is still unclear), as that will make it easier to extend later.
In the same vein as my proxy server idea, we could extend the language to allow
./path/to/json via json-to-dhall
Which would call a locally-installed conversion tool.
I would actually be fine baking in language support for JSON specifically instead of waiting for a more general solution
While this sounds sensible, it also sounds like a slippery slope - as @feliksik notes, we'd want to also support YAML, and why not also HCL, TOML, INI..
I'd suggest that if want to add this in the standard then we start by standardizing something like "implementations might support extensions in the form of the as Something import", by allowing some way of referencing this additional decoder - e.g. by passing a Dhall record in the form { Something = "./path/to/something/decoder" }.
It should be an executable so it's kind of compatible with all implementations, and it would take either the resolved path or the file contents in stdin and return a Dhall form to stdout