hcl icon indicating copy to clipboard operation
hcl copied to clipboard

Serialize hcl.Body to send over wire protocol for processing elsewhere

Open mitchellh opened this issue 6 years ago • 11 comments

Just recording these somewhere, not urgent, things I'm noticing.

If we partially decode using gohcl into a remain attribute, we're left with an hcl.Body. Today, you can decode that with gohcl (nice, easy) or with hcldec (hard, requires spec creation and knowledge of cty). hclwrite is a package for constructing HCL and doing surgical changes, leaning on the "advanced" side.

What I'd like to propose is a middleground where a given hcl.Body can be rendered to at least raw HCL text if not JSON as well. This would allow partial HCL configurations to be serialized/persisted in a non-binary form.

mitchellh avatar Jan 23 '19 16:01 mitchellh

Since an hcl.Body is a logical thing rather than a physical thing -- this is the key for the syntax-agnostic decoding and how features like the dynamic block extension work -- it's not generally possible to serialize a hcl.Body back to source without a schema. The gohcl.EncodeAsBlock and gohcl.EncodeIntoBody functions use the gohcl struct tags to produce such a schema, but hcl.Body fields are intentionally schemaless (for cases where the schema won't be known until a later step) and so they can't be supported there.

I think without some significant reorganization (which I expect would prevent some of the more interesting things we currently do with the hcl.Body abstraction) the best we could here in the immediate term is to expose a higher-level helper around gohcl.EncodeIntoBody that has an interface like json.Marshal and passes on the limitation that partial decoding (that is, fields of type hcl.Body, hcl.Attribute, etc) cannot be used, but should work correctly with structs containing fields of "normal" Go types. This is similar to what was discussed in hashicorp/hcl2#42, though I closed that for now since the functionality is available even if the API around it is quite low-level at the moment. We can use this issue to track the idea of introducing higher-level helpers that expose an encoding/json-like Marshal/Unmarshal interface for simple cases.

apparentlymart avatar Jan 23 '19 18:01 apparentlymart

Yeah, I was afraid that was the case.

I guess my outside-in POV is: the hcl.Body was parsed from some structure, I want to reencode it back to that exact same structure. Whether or not it is valid for what the program expects is later (and should require a schema or struct of course).

The "why?" is for partial decoding cases where I don't want to encode a schema across the wire. What I want to do is just send the partial schemaless config across the wire and have the decoding happen there (probably using gohcl for simple cases).

mitchellh avatar Jan 23 '19 19:01 mitchellh

I think what you are looking at there is the same problem I was exploring with the separate hclpack experimental package. I don't think that design is quite "right" yet, and indeed I was expecting to probably delete that package as part of the merge into the main HCL repo just to reflect that it's not baked at all, but I do agree definitely it's an important problem to solve, and the question for me at this point is more about how best to solve it while retaining the benefits that our abstractly-defined hcl.Body has.

The root "problem" here is that hcl.Body is not necessarily directly parsed from some structure, and may instead be some later transform of what was extracted. The dynamic blocks extension is a particular extreme example of that, but even just using PartialContent on a body gives you something that doesn't necessarily translate back to the original source, and the JSON form of hcl.Body is lacking information immediately after parsing because the JSON syntax is ambiguous. Indeed, that's why my hclpack experiment only supports native syntax (because there's enough information there to unambiguously reconstruct a physical syntax tree) and accepts source code as input rather than hcl.Body (to guarantee that it's holding a "physical" body).

Keeping in mind that it's definitely not baked or robust, if you want to experiment with the hclpack functionality for your use-case and see where it falls down I would love a real-world "experience report" to help inform the next iteration of it, and how it might integrate better with the rest of the HCL stack here. (An API like this could be offered by the hclsyntax package itself, for example; I only split it out here because I was still experimenting and didn't want to pollute the main API.)

One difference I can see immediately from what you asked here is that I only implemented a bespoke JSON serialization of a body so far. I didn't implement serializing directly back to HCL because that would lose all of the original source location information and thus cause any decoding errors to point to garbage locations in a generated source file that the end-user can't see. It sounds like this JSON approach could potentially serve the needs of your use-case if the recipient of the data on the wire just needs the information from the original config, and not the physical syntax it was expressed in.

The JSON encoding I defined initially is pretty large though, and in particular will always be much larger than the source code it was constructed from due to all of the JSON punctuation and position information overhead. Reducing the JSON packaging overhead seems doable by either packing it a different way or using some other serialization format, but I think it will always be cursed by overhead if retaining the source locations is a requirement (which I think it is in most cases, for user experience.)

apparentlymart avatar Jan 23 '19 19:01 apparentlymart

I don't want to take over this thread but I'd like to give a brief "experience report" on hclpack, as I've recently been exploring it.

My use case is roughly as follows:

  1. Load config files from disk (native hcl syntax). Client knows nothing/very little about schema.
  2. Configs are marshalled to json with hclpack and sent to server.
  3. Server unmarshals the json and decodes the body using the schema it knows about.
  4. Any diagnostics are sent back to the client.

This works really well, although i haven't had that much time with it yet. The one thing i've ran into so far is being able to send diagnostics back to the client, which can then display them with the source files. Marshalling hcl.Diagnostics to json seems to work reasonably well, but would be nice to have some "native" way of doing it. Size is less of an issue, as the diagnostics going back are likely much smaller than the configs coming in.

akupila avatar Jan 23 '19 20:01 akupila

@apparentlymart That makes total sense. I looked into hclpack and I think my issue with it is there doesn't seem to be a way to turn an hcl.Body into an hclpack.Body. If that was there I think that'd work 100%. I need that because I'm only trying to pack the partial structure.

I'd be supportive of keeping experimental packages under the "x/" prefix similar to Go. That way folks like myself and @akupila above can keep using them with the hope/intent that they'll be stable tree merged at some point.

mitchellh avatar Jan 23 '19 20:01 mitchellh

Indeed @mitchellh the current workflow for hclpack is kinda inverted from what you tried: instead of converting an existing hcl.Body to hclpack.Body, you would do the initial parse with hclpack to hclpack.Body and then use that result as an hcl.Body, since it implements the interface. (If this were in the main hclsyntax package, of course this parsing function would just be the main hclsyntax.ParseConfig, returning an hclsyntax.Body.)

hclpack.Body guarantees that any call to Content or PartialContent will return another hclpack.Body, so you can type-assert it back to hclpack.Body to get the extra functionality it offers. Since gohcl and hcldec both use Content and PartialContent internally, that applies when working through those abstractions too.

The obvious initial challenges here are:

  • hclpack only supports native syntax, since the JSON syntax is too ambiguous for it to deal with. This means any application relying on it can't support JSON input files, which loses an important HCL feature.
  • Since hclpack is currently split out into a separate package rather than being in the main hclsyntax package, you suddenly need an entirely new entry-point for it. (This one at least is just a temporary concern as a result of it being experimental.)

I was expecting we'd do a reorganization of the package layout in this repository as part of moving it into the main HCL repository as the 2.0.0 major release, and I totally agree that keeping these less-well-baked stuff in a special experimental area is a good compromise for making these things available for use in tagged releases while we're still figuring out the best design for them, as opposed to deleting them as I suggested above. Since the move into the main HCL repo will make all the package names change anyway, that's a good opportunity to amortize the inconvenience of updating callers to the new package paths. (though I planned to keep this separate hcl2 repo around for a while afterwards in an archived state, to avoid breaking existing callers.)

apparentlymart avatar Jan 23 '19 22:01 apparentlymart

Hi, any plan on this feature?

tcz001 avatar Nov 20 '19 01:11 tcz001

the reason to ask this, we are trying to do some refactoring work on our current tf files, and build a code generator for common snippets, when marshal of body is not there, we have to create the hcl on a command driven way, instead of having everything managed in a struct.

tcz001 avatar Nov 20 '19 23:11 tcz001

There are some existing ways to generate HCL native syntax programmatically today:

  • The lowest-level interface is the hclwrite sub-package, which provides an API for constructing a new HCL AST from scratch or for making certain surgical changes to an existing file.
  • The gohcl sub-package has a higher-level API which can go from a subset of what gohcl can unmarshal into an hclwrite body or block, using either EncodeIntoBody or EncodeAsBlock. This is a good choice if you are already decoding with gohcl anyway and you are decoding into concrete "normal" Go types. This may be less appropriate for Terraform-related configuration generation because it doesn't use gohcl and instead relies on dynamic decoding based on provider schemas.

The specific use-case this issue is focused on is something a little more subtle: the ability to partially decode a HCL configuration, deal with it a little, and then send the remainder on to some other system over a wire protocol for further processing. That use-case is difficult to meet with the current HCL API, and challenging to do in a way that retains the good error messages (with accurate source location information) that HCL aims to produce.

apparentlymart avatar Nov 20 '19 23:11 apparentlymart

Hello, any chance that there has been any more ideas or progress on how to achieve this, using hcl v2 (seems like hclpack has not been migrated)?

We have this exact use case; we decode the "known" hcl.Body into a struct using gohcl, and that struct has some ",remain" hcl tags to delay the decoding of an inner hcl.Body. We want to be able to serialize that inner hcl.Body (e.g. to JSON), send to a server for storage and once we receive some additional context (in the form of input values) we can then decode the inner hcl.Body.

I don't mind sacrificing accurate diagnostics, and the serialized form of the hcl.Body does not need to be human readable/friendly (of course both of those would be nice to haves!). So even just a possibility to return an hcl.Body back to tokens would be great, or the possibility to get the range of text that defines that body, I could write a small utility to extract the raw text from the file where it was declared...

BTW I love HCL, so amazing work, and it'd be great to overcome some of these challenges :)

jlarfors avatar Oct 17 '20 20:10 jlarfors

Hi, is it possible at the minimum that we add support so EncodeIntoBody can at least encode hcl.Expression fields? It's quite common a use case that the server would want to store a "validated" config file (usually merged from multiple .hcl files managed by different teams), where the merged config struct contains expressions that cannot be evaluated at storage time (e.g. runtime parameters provided by the consumers of this service)

The server could definitely store the original copies of those .hcl files after the validation, but it'd be much more convenient if the root Go struct can be encoded back to a single hcl file well formatted and compressed.

I'm not sure if some info would be lost during the decode-then-encode process (e.g. operator precedences, indentations or newlines, etc), but a simple working solution would be of great help.

WhoCalledInTheFleet avatar Nov 20 '23 20:11 WhoCalledInTheFleet