dhall-lang icon indicating copy to clipboard operation
dhall-lang copied to clipboard

Permit text parsing at import resolution time?

Open Gabriella439 opened this issue 4 years ago • 28 comments

I'm creating an issue to consolidate an idea that has come up in three separate contexts. The basic idea is to add the ability to import from an arbitrary Text source by specifying a custom Dhall parser. Specifically, we would allow Text introspection built-ins only within these custom parsers, but not elsewhere within a Dhall program.

Here is a concrete example from a recent Discussion on Slack:

env:ENV as ./parseEnvironment.dhall

... where ./parseEnvironment.dhall contains:

let Environment = < Development | Staging | Production >

in  \(text : Text) ->
      if Text/equal "prod" text then Some Environment.Production
      else if Text/equal "staging" text then Some Environment.Staging
      else if Text/equal "development" text then Some Environment.Development
      else None Environment

... so that a program can accept an environment variable of the form ENV=prod and that gets parsed at import resolution time into a more structured Environment union type.

The reason I use the as keyword here, is because then the current as Text functionality just becomes a special case where the parser is the identity function:

import as Text = import as (\(text : Text) → Some text)

... although if we really want to avoid special-casing as Text we could use a separate keyword for this new functionality.

The Text introspection built-ins would only be in scope for the import-related parsers and would still not be available elsewhere within the program. This would ensure that any weakly-typed logic is pushed to the edge of the system.

Two other contexts where this has come up are:

  • https://github.com/dhall-lang/dhall-lang/issues/121 (Importing from JSON files)

    If we had customizable parsers we could implement support for parsing JSON expressions in Dhall instead of having to bake JSON support into the language.

  • https://discourse.dhall-lang.org/t/text-manipulation-functions/176/39

    A discourse thread about Text manipulation where this idea also came up

If people support this idea, the main design decisions that would need to be fleshed out are:

  • What are the minimal set of new built-ins necessary so that we can ergonomically author these types of parsers

    People interested in this should check out the Discourse thread which already touched on this subject a bit

  • What type should the parsing function be?

    In particular, how should we represent error handling? Return an Optional or < Failure : Text | Result : a > or something else entirely?

    The goal is that we want to fail fast and reject a malformed import at import resolution time rather than silently failing with an unexpected value at normalization time.

If we can decide on those two things I can do the standardization work

Gabriella439 avatar May 01 '20 21:05 Gabriella439

That... seems like a terrible can of worms, when baking JSON/YAML parsing in implementations would be rather easy and justified by our current goal of displacing yaml. I expect many new feature requests to arise from this, in particular some kind of do notation, that will still be a complete pain because we have neither typeclasses nor ML modules to enable that kind of abstraction. I would personally need more convincing examples than the two you mention before engaging on this path.

Technical question: can we even make e.g. a JSON parser in Dhall without support for recursion ? I was thinking maybe corecursive types could help, but actually no if we want to propagate errors.

Nadrieril avatar May 02 '20 12:05 Nadrieril

Also, playing devil's advocate against myself: there is one issue that this leads to, which is how people would develop and distribute code for parsing imports.

For example, would the dhall command-line tool and the language server need a special "include Text built-ins" mode? If they did, people might just enable the relevant flags all the time, and not just for working on import-related parsers.

Similarly, if the Prelude had a field for a JSON parser, you would no longer be able to import the Prelude under normal conditions. Would we need a parallel package ecosystem for parsing code?

@Nadrieril: I'm also not sure how to actually write the JSON parser in Dhall now that I think about it.

Gabriella439 avatar May 02 '20 14:05 Gabriella439

For example, would the dhall command-line tool and the language server need a special "include Text built-ins" mode? If they did, people might just enable the relevant flags all the time, and not just for working on import-related parsers.

An alternative that avoids that problem would be to have a special opaque type for every parsing built-in, plus a set of combinators. Most importantly, there would be no destructor: the only way to use a constructed parser would be through the built-in import ... as construct.

We already know this can be done, there's plenty of inspiration among Haskell libraries and elsewhere. The only downside compared to the above proposal is the impact on the language specification.

blamario avatar May 02 '20 17:05 blamario

The nice thing about the built-in type route is that the combinators can be designed to fit Dhall expressions. The example above, say, could be written as

let Environment = < Development | Staging | Production >

in  Grammar/optional
      (Grammar/merge Environment {
          Production= Grammar/literal "prod",
          Staging= Grammar/literal "staging",
          Development= Grammar/literal "development"})

which, if I may say so myself, is much nicer looking than what you could build with the usual Applicative/Alternative combinators. As a bonus, it's also reversible.

blamario avatar May 02 '20 17:05 blamario

@blamario: There are still issues with adding uninterpreted built-ins to the default scope. Currently, a language binding can rely on the property that if a configuration file has type Bool with an empty context then that value must normalize to either True or False and the value can be marshalled into the language. However, now that value could be an uninterpreted Text/equal "x" "y", which would fail to marshal into the host language. This means that users would no longer be able to validate ahead of time that their configuration file was valid.

Gabriella439 avatar May 02 '20 19:05 Gabriella439

This means that users would no longer be able to validate ahead of time that their configuration file was valid.

Do you mean the file with the parser definition or the file that uses said parser definition on the right-hand side of as? The latter, if valid, would normalize to a regular Dhall value. For the former, all the usual typing judgements would still be applicable, and (if valid) its type would be something like Grammar/Type Environment, in the above example. Its normalized value would indeed be opaque, and my default position is that it need not be representable in the host language.

I'm not sure what you mean by "validate". The type checker would validate if the parser is well typed. Beyond that, well, you can never be sure if it covers every possible valid input but you can certainly verify any particular input file.

blamario avatar May 02 '20 19:05 blamario

@blamario: Never mind. I misunderstood what you are proposing. Now I see that you are proposing that parsing-related built-ins produce results in some sort of Grammar Applicative/Monad. So the type of Text/equal would be Text → Text → Grammar Bool

Gabriella439 avatar May 02 '20 19:05 Gabriella439

Yes, except it would be more like Grammar/literal : Text → Grammar ().

blamario avatar May 02 '20 19:05 blamario

@blamario: There is still the issue of how one would specify mutually recursive grammar rules (such as the JSON grammar)

Gabriella439 avatar May 02 '20 20:05 Gabriella439

One solution would be a Grammar/fix combinator:

let JSONGrammar = {
  array : Grammar [Value],
  object : Grammar (Map Text Value),
  value : Grammar Value,
  string : Grammar Text,
  ...}
let jsonGrammar : JSONGrammar = Grammar/fix jsonProductions
let jsonProductions : JSONGrammar → JSONGrammar = λ(g : JSONGrammar)→ {
  array= Grammar/brackets (Grammar/sepBy (Grammar/literal ",") g.value),
  object= Grammar/braces (Grammar/sepBy (Grammar/literal ",") g.member),
  member= Grammar/record {mapKey= Grammar/followedBy g.string "=",
                                               mapValue= g.value},
  ...}

There is a Haskell precedent for this combinator as well, incidentally implemented by myself a few years ago.

My earlier comment very much applies: while this is no doubt doable, it would have a serious impact on the language specification and consequently on its every implementation. I'll understand if you don't want to go near it. If you do, I hereby offer to write up one possible design so everybody understands the trade-offs.

blamario avatar May 03 '20 02:05 blamario

@blamario: Yeah, I understand that this would probably increase the specification complexity. Right now I'm just trying to collect ideas; I'm not in a hurry to make a decision

Gabriella439 avatar May 03 '20 03:05 Gabriella439

On a different topic, I was playing with an XML representation in Dhall. The following is a valid expression, you can play with it:

let Text/concat : List Text → Text = https://prelude.dhall-lang.org/Text/concat
let Text/concatMap : ∀(a : Type) → (a → Text) → List a → Text = https://prelude.dhall-lang.org/Text/concatMap
let Map : Type -> Type -> Type = https://prelude.dhall-lang.org/Map/Type
let Content : Type -> Type =
    \ (Element : Type) ->
    ∀ (Content : Type) ->
    ∀ (Plain : Text -> Content) ->
    ∀ (Markup : Element -> Content -> Content) ->
    ∀ (Contain : List Content -> Content) -> Content
let Element = {name: Text, attributes: Map Text Text}
let toXML : (Content Text -> Text) =
          \ (content : Content Text) ->
          let id = \(t : Text)-> t
          let elem = \(name : Text)-> \(c : Text)-> "<" ++ name ++ ">" ++ c ++ "</" ++ name ++ ">"
          in content Text id elem Text/concat
let example : Content Text =
    let Element = Text in
    \ (Content : Type) ->
    \ (text : Text -> Content) ->
    \ (markup : Element -> Content -> Content) ->
    \ (contain : List Content -> Content)
    -> contain [ text "Hello ", markup "b" (contain [text "World"])]
in {raw= example, xml= toXML example}

Of course I couldn't begin to write an XML parser, but note that even the XML writer above has problems. In particular, it doesn't look for and escape the special < and & characters in the text argument. The same problem would surface in any other encoding of markup, including the ubiquitous Markdown. Does anybody else find this issue worth addressing?

blamario avatar May 03 '20 12:05 blamario

Actually, the issue of escaping strings when generating output in a particular format does come up regularly. For me that's an argument for allowing Text manipulation always.

Nadrieril avatar May 03 '20 16:05 Nadrieril

For me that's an argument for allowing Text manipulation always.

Not necessarily. We could preserve the spirit of this proposal (i.e., pushing the text manipulation to the edges of the system) by extending the use of as or similar syntax for output as well as input.

blamario avatar May 03 '20 16:05 blamario

Ah, new idea: if we want custom parsers, they will need to generate data of a type that depends on the input. They might also need to take as input a user-supplied type annotation to drive parsing. This makes me think that they need access to a way to manipulate Dhall types. The can of worms gets wormier x) In fact I see two different phases here: a parser that constructs an AST, for example of type Prelude.JSON.Type, would be the first phase. Then we'd want something that converts that to an actual dhall type, possibly needing a type annotation. This would be the mirror version of toJSON https://github.com/dhall-lang/dhall-lang/issues/336. And this second phase can't even be done by hand currently, since we can't index into a Map. So either this second phase is builtin via a fromJSON keyword, or we'd need new magic like type reification and manipulation primitives.

Nadrieril avatar May 03 '20 18:05 Nadrieril

Not necessarily. We could preserve the spirit of this proposal (i.e., pushing the text manipulation to the edges of the system) by extending the use of as or similar syntax for output as well as input.

That's kinda cool. Dhall would have three phases: parsing, data manipulation, and printing. We could replace dhall-to-json with something in pure Dhall. That would look kind of like a plugin architecture in fact. But frankly that seems like such a hassle just because we don't trust our users to use Text primitives parsimoniously.

Nadrieril avatar May 03 '20 18:05 Nadrieril

@Nadrieril: The motivation behind restricting the language is not to protect people from errors in their own code, but rather to protect people from errors in their dependencies. Asking people to audit all of their transitive dependencies for stringly typed code won't scale.

Gabriella439 avatar May 03 '20 18:05 Gabriella439

But frankly that seems like such a hassle just because we don't trust our users to use Text primitives parsimoniously.

I personally agree, but that's because I generally trust responsible adults to do what's best for them. (Just don't ask me to explain the history of software engineering in that light.) Regardless, it's still worth exploring the design space.

blamario avatar May 03 '20 18:05 blamario

@Gabriel439 I mean that's like any other software engineering problem. In Haskell or Rust, libraries could use strings instead of sum types too, but no serious code does. I don't see why Dhall would be so different in this regard. The only reason I see for people to use stringly typed things instead of alternatives is in places where Dhall lacks some ergonomic features, like equality of unions. But in that case the solution is improving this features, not forbidding Text equality. Also whatever happens Dhall will force a user to handle all possible cases. Threading Optionals everywhere is enough of a pain that users will think twice before doing Text parsing that could have been avoided. Generally I feel like I have quite a different prior than you regarding how much people would use strings and how much of an impact on correctness that would have. I'd be interested to know what made you so wary of string manipulation.

Nadrieril avatar May 03 '20 22:05 Nadrieril

I had a look at issue #336, thanks @Nadrieril. I'm not sure these plugins would necessarily need to understand Dhall types. Here's my generalization of the requirements I gather for Grammar, JSON, XML, and other foreign objects:

  • A plugin of the kind we're discussing always supplies an opaque type, be it JSON, Grammar, or whatever. I'll concentrate on one of these opaque types called Bob.
  • There are values of type Bob, and they behave like all other values in Dhall except they're opaque. They can be stored in lists, records and unions, named in let, expected by lambdas. Same goes for Bob itself.
  • There are functions to construct Bob values, either out of ordinary Dhall values or out of smaller Bob instances. Some of these functions may not have a proper Dhall type, much like toMap or merge has no type even though map values to values. These would have to somehow provide both their result value and its type, if they succeed, or the error message when incorrectly applied.
  • There are no functions to take a Bob apart or convert it to an ordinary Dhall value within a Dhall expression.
  • The plugin however supplies two functions, call them import : Text → Optional Bob and export : Bob → Text. The former can be invoked only by the as syntactic form, the latter is only used by the Dhall interpreter to present a value of type Bob on the command line.
  • A Dhall implementation may or may not provide a way to load a Bob value into the host language.

I think this covers the use cases I've seen so far. The hard part is in defining a plugin interface.

blamario avatar May 04 '20 01:05 blamario

I opened up a related issue for another idea for how to solve the original problem: https://github.com/dhall-lang/dhall-lang/issues/994

Gabriella439 avatar May 04 '20 16:05 Gabriella439

Some of this is a bit unrelated, but I very much like the idea of being able to do text parsing or manipulation at the edge of the system. Text equality gets me nothing while actually working in Dhall, but there are a number of things I've done that would benefit from Text not being opaque. A simple example would be a system designed to monitor a number of other running system via log files or some other text. If these subsystems didn't have a common output format, the monitor would have to hardcode an interpreter for each different type. However, it would be very nice if Dhall could configure this monitor by providing a list of systems to watch, each with a Text -> Status function to be used as an adapter of sorts. If this idea goes further, I hope it isn't restricted to imports.

More on topic: maybe an entirely new type could be used to represent text coming from outside Dhall and have the builtins only work for that type. It could be converted to Text but not back. Then all files could still parse without Dhall needing to be in a specific "mode". It wouldn't give much incentive to convert to Text though and still has potential for abuse

NMerkley avatar May 04 '20 20:05 NMerkley

@NMerkley: I like the idea of having a separate Text type (maybe String or Characters, to suggest that the value can be decomposed into characters) that permits introspection which can only be created by an import. That cleanly resolves the issue of extra built-ins because the built-ins could always be safely in scope.

The only issue I can think of is that it would be nice if there were a way to forbid invalid values at import resolution time. For example, if the imported Text could only be one of "prod", "staging", or "development", it would be nice to be able to fail fast instead of threading around an Optional Environment throughout the program.

Gabriella439 avatar May 05 '20 15:05 Gabriella439

Super not on board with the idea of adding text parsing abilities.

https://github.com/mozilla/nixpkgs-mozilla/blob/master/lib/parseTOML.nix

^ this is a nix toml parser. As you would expect, it is super slow, runs at evaluation time, and is very hard to verify correct.

The final workaround was to add a fromTOML builtin to nix.

Profpatsch avatar May 21 '20 23:05 Profpatsch

If this wasn't constrained to Text sources but could also apply to Text values it could also allow syntactic sugar for custom data types without having to implement every single one of them in every single language implementation. (In general I think something like this is highly desirable to avoid a combinatorial explosion of required implementation effort to make Dhall comfortable to use but still catching errors as early as possible - namely during import and typechecking instead of when the program using the Dhall value is run.) For example, IPs (from #217) could be written as let IPv4 = [someparser] in "127.0.0.1" as IPv4.

hyperfekt avatar May 26 '20 18:05 hyperfekt

@hyperfekt: Yeah, I'm warming up to the idea of generalizing the "import resolution" phase to become an "unsafe stuff" phase, which would include both resolving imports and Text manipulation. Then users concerned about unsafety could disable the phase entirely or "pre-resolve" things so that they are left with only type-checking and normalization at runtime.

Gabriella439 avatar May 26 '20 18:05 Gabriella439

Obviously there are safety implications here, but what would be ideal is if the parsers didn't have to be written in Dhall at all, something like:

./some.json as sh:json-to-dhall

singpolyma avatar Jun 20 '20 00:06 singpolyma

@singpolyma I suggested a similar idea here. I'm still half convinced by @Gabriel439 and @ari-becker's answers, and I like your idea (I guess I'm not a heavy dhall user, which is true, no sarcasm)

lisael avatar Jun 20 '20 21:06 lisael