go-cmp icon indicating copy to clipboard operation
go-cmp copied to clipboard

Any way to do value-predicated filtering (rather than type)

Open thockin opened this issue 3 years ago • 8 comments

I have a function like this:

type T {
    // ... 
    IPs []string
    Port int
}

func Create(input T) T {
    output := T{}
    // ...

    // If user provided IPs, use that.  Else find free ones.
    if len(input.IPs) > 0 {
        output.IPs = input.IPs
    } else {
        output.IPs = allocateIPs()
    }

    // If user provided Port, use that.  Else find a free one.
    if input.Port != 0 {
        output.Port = input.Port
    } else {
        output.Port = allocatePort()
    }

    return output
}

I'm trying to use cmp.Equal and cmp.Diff to test. I specifically want to test permutations where the input specifies and doesn't specify input IPs and ports. The real logic is significantly more complicated than this reductive example, obviously.

What I am looking for is a cmp.Option similar to IgnoreFields (since the type is []string) crossed with Comparer(), but without the symmetric requirement. For example:

cmpIPs := FieldComparer(T{}, "IPs", func(x, y []string) bool {
    if len(x.IPs) == 0 {
        return true
    }
    return reflect.DeepEqual(x.IPs, y.IPs)
}
cmpPort := FieldComparer(T{}, "Port", func(x, y int) bool {
    if x.Port == 0 {
        return true
    }
    return x.Port == y.Port
}
cmp.Equal(input, output, cmpIPs, cmpPort)

Is this possible to express?

thockin avatar Aug 01 '21 17:08 thockin

Is this possible to express?

Technically, yes.

The entire cmpopts package is implemented in terms of the public API of cmp. The feature that seems most useful to you is the currently unexported cmpopts.filterField function.

Assuming that function were exported, you could do something like:

cmpIPs := cmpopts.FilterField(T{}, "IPs", cmp.Comparer(func(x, y []string) bool {
    return reflect.DeepEqual(x, y) || len(x) == 0 || len(y) == 0
}))
cmpPort := cmpopts.FilterField(T{}, "Port", cmp.Comparer(func(x, y int) bool {
    return x == y || x == 0 || y == 0
 }))
cmp.Equal(input, output, cmpIPs, cmpPort)

I modified the checks against 0 to be symmetric. The requirement that cmp.Equal be symmteric was partly to support #67, where we would like to keep the property that:

  • !cmp.Less(x, y) and !cmp.Less(y, x) implies that cmp.Equal(x, y) or cmp.Equal(y, x)

Regarding why cmpopts.filterField is currently unexported, it's because there are some issues with composability that need to be figured out. For now, perhaps its best to just fork the implementation of cmpopts.filterFields.

dsnet avatar Aug 02 '21 19:08 dsnet

For the composibility problems with cmpopts.FilterField, this is a discussion I had with Roger on Slack some time ago:

  • @rogpeppe: I was just trying to do something with cmp and remembered that it's perhaps harder than it should be. I want to apply a comparison option (in this case, cmpopts.EquateApproxTime) to one particular field in a given struct type. What's a nice way of doing that?

  • @dsnet: Ideally, you would use the unexported cmpopts.filterFields helper

  • @dsnet: I never exposed it since composability of filters is a tricky subject

  • @rogpeppe: ah, i'd just found that in general, i think the Path support is a weak spot of cmp currently

  • @dsnet: I don't know if it's fair to say that Path itself is a weakness, since it's just a data type that expresses how we got to the current value we're comparing relative to the root. At present, that type expresses everything that you might want to know.

    I think the weakness is stemming from what operations can easily be done with a Path, whether they be expressed as methods on the Path or as functions that operate on a Path.

    Taking the earlier question of how to filter cmpopts.EquateApproxTime to a specific struct field, the Path has all the information available to accomplish that. The question then becomes, how can we express that in the most natural and intuitive way.

    Regarding the unexported filterField function, suppose you do:

    cmpopts.FilterField(MyStruct{}, "MyField", someOpt)
    

    an ambiguity arises whether your intention is for the FilterField to:

    • only apply exactly to the value at MyStruct.MyField, OR
    • apply to the value at MyStruct.MyField and all sub-values of it.

    I can think of situations where you might want one semantic or the other. I feel like part of the challenge is that describing filters on a tree-like structure is a hard problem in computer science regardless of whether it's in cmp or elsewhere.

  • @rogpeppe: How about FilterFieldOnly as well as FilterField? the latter form would match anything under

  • @dsnet: Do you think most users would understand the difference? I feel like most push-back from the proposal to add cmp to the standard library is that the library is too complicated

  • @rogpeppe: good question. I'll think about that

  • @dsnet: In the past, I thought there was a good reason why "only apply exactly to the value at MyStruct.MyField" might be desired, but now I'm struggling to remember what it was. Maybe we should just expose cmpopts.FilterField with the "apply to the value at MyStruct.MyField and all sub-values of it" semantics.

  • @rogpeppe: yes, on thinking about it that sounds good after all, the standard options apply recursively to the root value

  • @dsnet: Quiz: would you expect there to be an semantic difference between:

    • cmpopts.FilterField(StructAlpha{}, "FieldAlpha", cmpopts.FilterField(StructBravo{}, "FieldBravo", opt))
    • cmpopts.FilterField(StructBravo{}, "FieldBravo", cmpopts.FilterField(StructAlpha{}, "FieldAlpha", opt))

    ?

  • @rogpeppe: i think they should both be identical after all, in both places, opt applies only when we are both inside StructAlpha.FieldAlpha and StructBravo.FieldBravo

  • @dsnet: I believe you are correct since FilterField would only check whether that StructX.FieldX is in the path. Also, chaining filters together is equivalent to a logical AND. Thus the above is equivalent to saying: (is StructAlpha.FieldAlpha in the path) AND (is StructBravo.FieldBravo in the path) (is StructBravo.FieldBravo in the path) AND (is StructAlpha.FieldAlpha in the path)

  • @rogpeppe: exactly

  • @dsnet: The downside of the filter chaining is that it visually looks like it would be a "comes before" operator where someone might expect it to be: (StructAlpha.FieldAlpha is in the path) AND it occurs before (StructBravo.FieldBravo in the path) I don't know if that's gonna be a significant gotcha or not

  • @rogpeppe: would FilterField take multiple option arguments? actually I've just realised they should be considered different. there is and should be a contains relationship there given that struct contains relationships aren't usually mutually recursive, that should be relatively intuitive

  • @dsnet: If the relationship should be a "contains", then it seems the API for options would also need to also be expanded such that you can unwrap an option. This way the implementation for cmpopts.FilterField can check whether the provided opt has previously been wrapped with a cmpopts.FilterField and enforce the "contains"-like semantic. (aside: isn't it interesting how many type-theory-ish problems exist for cmp?)

  • @rogpeppe: hmm, is that so? i need to think about this in the morning. why would FilterField act any differently from any other option in that respect?

  • @dsnet: You could argue that cmp.FilterPath (which is the underlying implementation of cmpopts.FilterField) should have gone with "contains" semantics instead of a logical AND semantic when wrapped, but that ship has unfortunately already sailed.

dsnet avatar Aug 02 '21 19:08 dsnet

Interesting. The symmetric bit is a trick for this case though - the comparison I need is not symmetric. (nil, "value") is allowed while ("value", nil) is an error.

For the moment I just pre-process the object and do if left.IPs == nil { left.IPs = right.IPs }. It's hacky but it got the job done. So consider this request low-prio, I guess. :)

thockin avatar Aug 03 '21 05:08 thockin

I recently had the same question and ended up essentially did the same as @thockin with github.com/imdario/mergo:

// Make a copy of want so we can merge ignored fields into it for diff without
// modifying the original object.
want = want.DeepCopyObject()
wantVal := reflect.ValueOf(want)
gotVal := reflect.New(reflect.Indirect(wantVal).Type())
got := gotVal.Interface()

... // populate got

// Merge all non-empty ignored fields from got to want.
if err := mergo.Merge(want, reflect.Indirect(gotVal).Interface()); err != nil {
    return fmt.Errorf("unable to merge ignored fields: %w", err)
}
if diff := cmp.Diff(want, got); diff != "" {
    return fmt.Errorf("unexpected diff (-want +got):\n%s", diff)
}

Specifically, want and got are some k8s objects that has fields of type time.Time, which cannot be handled by mergo, so the mergo transformer listed here is needed to make the above trick work (with light modifications to make it work on metav1.Time): https://pkg.go.dev/github.com/imdario/[email protected]#hdr-Transformers

chhsia0 avatar Oct 31 '22 18:10 chhsia0

I have the exact same use case: diffing K8s API objects and not wanting symmetry. Moreover, I'd like to specify this rule in a generic way, because the objects can have many nested subfield. It'd be tedious to specify an Option or Comparer for every nested struct/subfields.

i.e. I'd like a rule like: "when diffing O1 and O2, ignore all fields of O1 that are nil, no matter their type or path"

matteoolivi avatar Nov 11 '22 00:11 matteoolivi

The symmetric nature of cmp.Equal is a property I don't think we should give up, but I could imagine something like:

cmp.Diff(want, got, cmpopts.IgnoreUnpopulated(want))

where cmptsopts.IgnoreUnpopulated uses go reflection to dynamically build a cmp.FilterPath from want. It functionally provides a way to have a asymmetric comparison from a symmetric comparison.

The exact semantics of cmpopts.IgnoreUnpopulated would have to be thought-out, though.

dsnet avatar Nov 11 '22 00:11 dsnet

Another snippet to use in combination with the time transformer to correctly compare KRM object statuses (before https://github.com/imdario/mergo/pull/222 is merged):

// The following function allows deep-merging slices (e.g., []metav1.Condition).
return func(dst, src reflect.Value) error {
    if dst.CanSet() {
        for i := 0; i < src.Len() && i < dst.Len(); i++ {
            srcElem := src.Index(i)
            dstPtr := dst.Index(i).Addr() // an element of a slice is always addressable
            if !srcElem.CanInterface() || !dstPtr.CanInterface() {
                continue
            }
            if err := mergo.Merge(dstPtr.Interface(), srcElem.Interface(), mergo.WithTransformers(self)); err != nil {
                return err
            }
        }
        if dst.Len() < src.Len() {
            dst.Set(reflect.AppendSlice(dst, src.Slice(dst.Len(), src.Len())))
        }
    }
    return nil
}

chhsia0 avatar Nov 11 '22 14:11 chhsia0

The symmetric nature of cmp.Equal is a property I don't think we should give up

@dsnet I see that a non-symmetric equal would be confusing. OTOH, it might make sense because having Diff and Equal behave inconsistently might be equally surprising to users.

matteoolivi avatar Nov 13 '22 18:11 matteoolivi