cue cmd/cue: add diff support

Originally opened by @mpvl in https://github.com/cuelang/cue/issues/8

Allow diffing between (snapshots of) previous version and current version.

Jul 03 '21 10:07 cueckoo

Original reply by @jugaadi in https://github.com/cuelang/cue/issues/8#issuecomment-528032804

Any updates on this feature?

Jul 03 '21 10:07 cueckoo

Original reply by @mpvl in https://github.com/cuelang/cue/issues/8#issuecomment-780522675

It would be good for people to give examples of how they would like to use this functionality in a CLI. There is an internal package for diffing CUE, which could potentially also be exposed as a Go API.

Jul 03 '21 10:07 cueckoo

Original reply by @myitcv in https://github.com/cuelang/cue/issues/8#issuecomment-799134595

Commenting simply to add the word semantic into the mix 😄

I've taken (perhaps incorrectly) to referring to cue diff as a semantic diff, distinct from cmp and friends ("plain diff") used in unity.

Jul 03 '21 10:07 cueckoo

Original reply by @eonpatapon in https://github.com/cuelang/cue/issues/8#issuecomment-803282650

Would be nice to have a diff API in cue lib. Currently I'm using Value.Decode and https://github.com/r3labs/diff to diff two cue instances in order to detect which part of the configuration has changed (based on this CI jobs are run).

However this works if the instances have only concrete values. So a similar API that can diff cue values directly would be nice (and required in my case to be able to use the flow API)

Jul 03 '21 10:07 cueckoo

Original reply by @myitcv in https://github.com/cuelang/cue/issues/8#issuecomment-803295207

@eonpatapon take a look at https://pkg.go.dev/cuelang.org/[email protected]/internal/diff. That is internal for now, but as we shape up the API it should be made non-internal.

Jul 03 '21 10:07 cueckoo

Original reply by @vikstrous2 in https://github.com/cuelang/cue/issues/8#issuecomment-816729814

A semantic diff would be really interesting. I think with diffing kubernetes files, the most awkward part is the way that the kubernetes API wants to be given a list of yaml objects but the identity of those objects is usually best defined by their name rather than their position in a list. My current idea for diffing them is to write them out into a directory tree where the name of the file is based on the name and type of the object and then using git diff. Then as long as the yaml and all of its fields and lists of named objects are sorted in some stable way, this is good enough for most kubernetes things.

Jul 03 '21 11:07 cueckoo

Original reply by @myitcv in https://github.com/cuelang/cue/issues/8#issuecomment-831097159

Adding something of an experience report here from the world of unity.

unity tests (of cmd/cue) generally follow this rough pattern:

Ensure that evaluation of a given configuration semantically matches expectations. This is, in effect, a CUE semantic diff
Verify that output in a specific format (JSON, Yaml, etc) matches expectations. This is, in effect, a semantic diff in the output format

In the case of point one this will look like:

# CUE semantic diff
cue eval -o out.cue X
cue diff out.cue ref.cue

# JSON diff
cue eval -o out.json X
cue diff out.json ref.json

Some questions:

Is cue eval the right command here? cue eval will become cue - but does that give the intended result, as far as concreteness etc are concerned?
file.ref will need to be a complete, self-contained configuration. This might well require it to be a txtar archive?

Stepping back a bit further, we should also be able to write a semantic diff for point 2, on the basis that CUE knows about the semantics of these different formats (even to some extent the different versions of, say, Yaml, JSON).

Jul 03 '21 11:07 cueckoo

In the case of generating kubernetes configs, we use a _tool.cue file to write out the config to many different yaml files. Those files can be individually diff'd by git diff. A fancier diffing algorithm would probably just do some sorting and normalization before diffing, which would bring it pretty close to a semantic diff. There isn't anything really cue specific needed for that.

If cue eval is used for diffing, it might produce unusual looking output if most users of the cue config usually interact with it through a _tool.cue command.

A diff of cue configs is interesting, but it seems like a very open ended topic. I think my understanding of CUE is not sufficient to have an opinion on how that should work.

Aug 03 '21 12:08 vikstrous2

FWIW what we are currently doing is output the result of cue eval that we then sorts with jq.

That's less than optimal (!) and a proper semantic diff would be much appreciated.

Aug 03 '21 18:08 PierreR

@vikstrous2 @PierreR I've used https://github.com/homeport/dyff for creating diffs of outputs from CUE for CloudFormation. Quite nice. Works really well for semantic diffs. Worth a look! And, hopefully we can build something similar for CUE at some point. :)

Aug 09 '21 17:08 jlongtine

And, hopefully we can build something similar for CUE at some point. :)

The query extension offers some nice potential for good semantic diff output.

Aug 09 '21 19:08 myitcv

A sub-feature-request: allow the user to specify if/that certain lists' contents can be treated as being identical based on member contents, not ordering. In other words (I /think/): to mark specific lists as sets, not lists, for diff purposes.

Rationale: field ordering in a struct is always(?) semantically unimportant. Ordering in a list is /sometimes/ unimportant, but it's case-specific. The cue diff feature would be valuable to a wider set of users if there were the ability to mark/configure (at diff time) which lists' contents were, essentially, order-agnostic.

Example: I'm currently using CUE to emit GitHub Actions (GHA) workflow .yml files, and I needed to assert that a specific commit had only reorganised the input CUE, and hadn't change the output .yml. One list in my workflow file is "the jobs to run". Obviously, order is important there. But there's a job at the /end/ of the workflow, which waits for all other jobs, and reports if all jobs succeeded. This final job's input is a list of job names, but it doesn't care about the order, merely that they're present. Being able to cue diff the input CUE, and not rely on git diff of the output yaml, would have been useful; even moreso if I could have taught the diff invocation to ignore ordering changes to the final job's input list.

May 08 '23 11:05 jpluscplusm

In other words (I /think/): to mark specific lists as sets, not lists, for diff purposes.

Having been kindly pointed towards https://github.com/cue-lang/cue/issues/14 and onwards to https://github.com/cue-lang/cue/issues/165#associative-lists, I can amend my sub-request to be: please could diff be capable of being associative-list aware, with the ability to (if not the default behaviour of) ignoring purely order-based changes in associative lists. TVM!

May 08 '23 12:05 jpluscplusm

As part of the recent issue garden, we are focussing on non-feature requests. As such, I'm removing the milestone on this feature request. We will revisit feature requests in a later pass, at which point we will start to milestone and prioritise new features (in addition to those that we are already working on).

Jun 20 '23 09:06 myitcv

cue cue copied to clipboard

cmd/cue: add diff support

cue
cue copied to clipboard