zora icon indicating copy to clipboard operation
zora copied to clipboard

Structured comparison

Open mindplay-dk opened this issue 2 years ago • 6 comments

I've come up with something incredibly useful that I'd like to share. 🙂

I have to write tons of integration/functional tests for JSON APIs, and I found this to be extremely cumbersome - you get these big, structured JSON responses, and for the most part, you just want to check them for equality with your expected JSON data. But then there are all these little annoying exceptions - for example, some of the APIs I'm testing return date or time values, and in those cases, all I care is that the API returned a date or a number. There is no right or wrong value, just a type-check, sometimes a range-check, etc.

Simply erasing the differences before comparison is one way to go - but it's cumbersome (for one, because I'm using TypeScript, and so it's not easy to get the types to match) and misses the actual type/range-check, unless I hand-code those before erasing them.

It didn't spark joy, and so I came up with this, which works really well:

image

Now I can have a code layout for expected that matches the actual data structure I'm testing - just sprinkle with check, anywhere in the expected data structure, and those checks will override the default comparison for values in the actual data structure.

It's an extension to Assert.equal, along with a helper function check, which basically just creates a "marker object", so that the comparison function internally can distinguish checks on the expected side from other objects/values.

In the example here, I'm using the is NPM package - most of the functions in this library (and similar libraries) work out of the box, although those that require more than one argument need a wrapper closure, e.g. check(v => is.within(v, 0, 100)). (This might could look nicer with some sort of wrapper library, e.g. check.within(0, 100) - but it works just fine as it is.)

I'm posting the code, in case you find this interesting:

const Compare = Symbol("Compare");

type Comparison = (value: any) => boolean;

interface Check {
  [Compare]: Comparison;
}

function isCheck(value: any): value is Check {
  return (value !== null) && (typeof value === "object") && (Compare in value);
}

/**
 * Creates a {@see Check} for use with {@see equal} - works well with packages such as `is`.
 * 
 * @link https://www.npmjs.com/package/is
 */
export function check(comparison: Comparison): Check {
  return {
    [Compare]: comparison
  };
}

/**
 * Compares `actual` against `expected`.
 * 
 * You can put comparators (created by {@see check}) anywhere in your `expected` object graph
 * to override the way actual values are compared against expected values - for example:
 * 
 *   const same = equal(
 *     { foo: 123, bar: new Date() },
 *     { foo: 123, bar: check(v => v instanceof Date) }
 *   );
 */
export function compare(actual: any, expected: any): boolean {
  if (isCheck(expected)) {
    return expected[Compare](actual);
  }

  // Everything below here is just `fast-deep-equal` (unminified and reformatted)

  if (actual === expected) {
    return true;
  }

  if (actual && expected && typeof actual == "object" && typeof expected == "object") {
    if (actual.constructor !== expected.constructor) {
      return false;
    }

    if (Array.isArray(actual)) {
      let length = actual.length;
      if (length != expected.length) {
        return false;
      }
      for (let i = length; i-- !== 0; ) {
        if (!compare(actual[i], expected[i])) {
          return false;
        }
      }
      return true;
    }

    if (actual instanceof Map && expected instanceof Map) {
      if (actual.size !== expected.size) {
        return false;
      }
      for (let i of actual.entries()) {
        if (!expected.has(i[0])) {
          return false;
        }
      }
      for (let i of actual.entries()) {
        if (!compare(i[1], expected.get(i[0]))) {
          return false;
        }
      }
      return true;
    }

    if (actual instanceof Set && expected instanceof Set) {
      if (actual.size !== expected.size) {
        return false;
      }
      for (let i of actual.entries()) {
        if (!expected.has(i[0])) {
          return false;
        }
      }
      return true;
    }

    if (ArrayBuffer.isView(actual) && ArrayBuffer.isView(expected)) {
      let length = actual.byteLength;
      if (length != expected.byteLength) {
        return false;
      }
      for (let i = length; i-- !== 0; ) {
        if ((actual as any)[i] !== (expected as any)[i]) {
          return false;
        }
      }
      return true;
    }

    if (actual.constructor === RegExp) {
      return actual.source === expected.source && actual.flags === expected.flags;
    }
    if (actual.valueOf !== Object.prototype.valueOf) {
      return actual.valueOf() === expected.valueOf();
    }
    if (actual.toString !== Object.prototype.toString) {
      return actual.toString() === expected.toString();
    }

    let keys = Object.keys(actual);
    let length = keys.length;
    if (length !== Object.keys(expected).length) {
      return false;
    }

    for (let i = length; i-- !== 0; ) {
      if (!Object.prototype.hasOwnProperty.call(expected, keys[i])) {
        return false;
      }
    }

    for (let i = length; i-- !== 0; ) {
      const key = keys[i];
      if (!compare(actual[key], expected[key])) {
        return false;
      }
    }

    return true;
  }

  // true if both NaN, false otherwise
  return actual !== actual && expected !== expected;
}

Unfortunately, as you can see, compare is a verbatim copy/paste of fast-deep-equal, which is a bit unfortunate - the function is recursive, so there is no way to override it's own internal references to itself, hence no way to just expand this function with the 3 lines I added at the top. The feature itself is just a few lines of code and some type-declarations.

Note that, in my example above, I've replaced Assert.equal with my own assertions - not that interesting, but just to clarify for any readers:

const equal = (
  actual: any,
  expected: any,
  description = 'should be equivalent'
) => ({
  pass: compare(actual, expected),
  actual,
  expected,
  description,
  operator: "equal",
});

const notEqual = (
  actual: any,
  expected: any,
  description = 'should not be equivalent'
) => ({
  pass: !compare(actual, expected),
  actual,
  expected,
  description,
  operator: "notEqual",
});

Assert.equal =
  Assert.equals =
  Assert.eq =
  Assert.deepEqual =
  Assert.same = equal;

Assert.notEqual =
  Assert.notEquals =
  Assert.notEq =
  Assert.notDeepEqual =
  Assert.notSame = notEqual;

I would not proposition this as a replacement, but probably as a new addition, e.g. Assert.compare or something.

The major caveat/roadblock for adoption of this feature as something "official" is the reporter - if comparison fails for some other property, the diff unfortunately will highlight any differing actual/expected values as errors.

That's kind of where this idea is stuck at the moment.

I'm definitely going to use it myself, and for now just live with this issue, since this improves the speed of writing and the readability of my tests by leaps and bounds.

But the idea/implementation would need some refinement before this feature could be made generally available.

The diff output right now is based on JSON, which is problematic in itself - the reporter currently compares JSON serialized object graphs, not the native JS object graphs that fast-deep-equal actually uses to decide if they're equal. It's essentially a naive line diff with JSON.stringify made canonical.

Something like node-inspect-extracted would work better here - this formatter can be configured to produce canonical output as well, and since all the JSON diff function in jsdiff appears to do is a line-diff on (canonical) JSON output, I expect this would work just as well?

The point here is, the assertion API actually lets you return a different actual and expected from the ones that were actually compared by the assertion - and so, this would enable me to produce modified object graphs, in which values that were compared by checks could be replaced with e.g. identical Symbols, causing them to show up "anonymized" as something like [numeric value] or [Date instance] in the object graph.

(I suppose I could achieve something similar now by replacing checked values with just strings - but then we're back to the problem of strings being compared as equal to things that serialize to the same string value... e.g. wouldn't work for actually writing a unit-test for the comparison function itself. Might be an opportunity to kill two birds with one stone?)

mindplay-dk avatar Dec 15 '21 10:12 mindplay-dk

I've come up with something incredibly useful that I'd like to share. 🙂

I have to write tons of integration/functional tests for JSON APIs, and I found this to be extremely cumbersome - you get these big, structured JSON responses, and for the most part, you just want to check them for equality with your expected JSON data. But then there are all these little annoying exceptions - for example, some of the APIs I'm testing return date or time values, and in those cases, all I care is that the API returned a date or a number. There is no right or wrong value, just a type-check, sometimes a range-check, etc.

Simply erasing the differences before comparison is one way to go - but it's cumbersome (for one, because I'm using TypeScript, and so it's not easy to get the types to match) and misses the actual type/range-check, unless I hand-code those before erasing them.

It didn't spark joy, and so I came up with this, which works really well:

This is very interesting! It is a neat way to provide an extension point to the deep equal algorithm. That would require us to maintain the deep equal algo on our side rather than relying on a battle tested third party library, but why not after all. It can start with a simple fork/copy&paste at first.

The major caveat/roadblock for adoption of this feature as something "official" is the reporter - if comparison fails for some other property, the diff unfortunately will highlight any differing actual/expected values as errors.

That's kind of where this idea is stuck at the moment.

I'm definitely going to use it myself, and for now just live with this issue, since this improves the speed of writing and the readability of my tests by leaps and bounds.

But the idea/implementation would need some refinement before this feature could be made generally available.

The diff output right now is based on JSON, which is problematic in itself - the reporter currently compares JSON serialized object graphs, not the native JS object graphs that fast-deep-equal actually uses to decide if they're equal. It's essentially a naive line diff with JSON.stringify made canonical.

I assume you are talking about the diff reporter. I don't see it as a problem that is on the contrary an easy way to handle 95% of the cases.

Something like node-inspect-extracted would work better here - this formatter can be configured to produce canonical output as well, and since all the JSON diff function in jsdiff appears to do is a line-diff on (canonical) JSON output, I expect this would work just as well?

The point here is, the assertion API actually lets you return a different actual and expected from the ones that were actually compared by the assertion - and so, this would enable me to produce modified object graphs, in which values that were compared by checks could be replaced with e.g. identical Symbols, causing them to show up "anonymized" as something like [numeric value] or [Date instance] in the object graph.

I am not sure using another library would solve the problem, you'll get more options to control the serialization but either way it comes down to associate a token list to a property (which you can also do with JSON.stringify):

const symbol = Symbol('meta');

const check = {
    [symbol]: function () {
    
    },
    toJSON() {
        return '[META]'
    }
};

const obj = {
    key: 'value',
    keyWithCustomCheck: check
}

console.log(JSON.stringify(obj))
// > {"key":"value","keyWithCustomCheck":"[META]"}

We could replace JSON.stringify by node.inspect and then run the string diff algo on the resulting serialization, that probably make things nicer in some way. But I don't think that would solve the problem.

The difficulty is that the actual object would not get all the Symbols (or meta data) so you would need to use the expected value as a kind of a blueprint to visit the actual value nodes which makes things way more complicated, going from a simple well known string diff algorithm to a totally different tree diff algorithm...

I think using something like deep-object-diff in the first place an work on the output would be required.

That's always the trade-off: how much effort you want to make compared to the added value. As I said, I believe the simple string diff has a nice output in 95% of the cases. It remains a very interesting area of investigation though.

Going back to your example, I would simply do something like

integration test example
import {test} from 'zora';

const isDate = (value) => value?.constructor === Date;

const createReplacer = () => {
    let ctx;
    return (key, value) => {
        
        if (typeof value === 'object') {
            ctx = value;
        }
        
        // you need to go with ctx[key] rather than "value" because value is already serialized for things like Date
        if (isDate(ctx?.[key])) {
            return '[Date]';
        }
        
        if (key === '_id') {
            return '#id';
        }
        
        return value;
    };
};

const createCompareFn = (t) => (a, b) => {
    // comparing string
    t.eq(JSON.stringify(a, createReplacer(), 4), JSON.stringify(b, createReplacer(), 4));
    
    // or comparing objects but removing the moving parts if you prefer
    t.eq(JSON.parse(JSON.stringify(a, createReplacer(), 4)), JSON.parse(JSON.stringify(b, createReplacer(), 4)));
};

test(`some json`, (t) => {
    
    const compare = createCompareFn(t);
    
    const obj1 = {
        _id: 'asdfjalkdfj',
        someDate: new Date(Math.random() * 10e6),
        foo: {
            bar: 'blah',
            otherDate: new Date(Math.random()  * 10e6)
        }
    };
    
    const obj2 = {
        _id: 'adfsdfsdf',
        someDate: new Date(Math.random() * 10e6),
        foo: {
            bar: 'blas',
            otherDate: new Date(Math.random() * 10e6)
        }
    };
    
    compare(obj1, obj2)
});

which gives you in string mode

Screenshot 2021-12-16 at 13 49 11

or in object mode

Screenshot 2021-12-16 at 13 49 23

It does not have the assertive part ("it is a date and so"), but that is not something important here, is it ?

lorenzofox3 avatar Dec 16 '21 13:12 lorenzofox3

Yeah, the idea is definitely half-baked.

I ended up abandoning that idea, and what I do now instead, is I have a helper-function that takes my input data-structure, type-checks it, and discards anything that isn't a simple, comparable value.

Something like:

function checkData(data, assert) {
  return data.map(({ created, updated,, ...rest }) => {
    assert.ok(created instanceof Date);
    assert.ok(updated instanceof Date);
    return rest;
  });
}

test(`whatever`, assert => {
  // ...
  assert.equal(
    checkData(data),
    [
      // ... everything else ...
    ]
  );
});

I resisted this at first, but it actually grew on me.

I have a lot of different tests for different data structures of the same type - and this works nicely for that case: I can reuse the "check and filter" functions many times over, whereas, with the approach I was proposing here, I would have had to repeat the date-checks. Even in the proposed short, declarative form, this would still be a lot of tests I'd have to update after, say, changing a Date to timestamp number - now those checks are in one function.

I could still see a use-case for structural assertions for someone who has lots of different data structures they need to check.

Fortunately, on this project, it's a whole bunch of tests for different input/output values all with the same data-structures, so it's not really a pressing matter for me right now.


I will be thinking more about the reporting issue though. Currently, something like comparing a simple Map or Set will fail by default. I don't want toJSON in my domain model unless it belongs in the domain - having to write custom replacers doesn't spark joy for me either. Far from everything in JS is JSON. I mean, can you imagine if console.log couldn't even print a Map? 🙈

We could replace JSON.stringify by node.inspect and then run the string diff algo on the resulting serialization, that probably make things nicer in some way. But I don't think that would solve the problem.

Right, I'm kind of mixing up two different issues again, sorry.

At the moment, being able to see the difference between a Date and a string that looks like the same date.toString() is probably a bigger priority for me - I ran into that case several weeks back, and console.log ultimately was the only saving grace.

node.inspect uses an indented output format - shouldn't that work quite well with a line-diff? At least as well as JSON, I think? Which is all the JSON differ does anyway, as far as I can tell?

mindplay-dk avatar Dec 20 '21 10:12 mindplay-dk

Hi, I skimmed this discussion, but it seems like you're talking about asymmetric matchers. Jasmine and Jest uses them. I like the idea as well for integration tests. But I also use a variation for unit tests.

My hack is similar, but even more crude. I modified fast-deep-equals with

for (i = length; i-- !== 0;) {
    var key = keys[i];

    if (typeof b[key] === 'function') {
        const matcher = b[key](a[key]);
        b[key] = matcher.expected;
        a[key] = matcher.actual;
    }
}

Then I use something like

const length = expected => actual => ({ expected: `length(${expected})`, actual: `length(${actual?.length})` });

test(`deal`, ({ eq }) => eq(deal(createDeck()), { deck: length(43), hand: length(11) }));

And when I get an error

Expected 53 but got 54

I have done the same in mocha, where I use like this:

{
        ...
        _id: expectObjectId(),
        dti: expectNow(),
        duration: expectInteger(),
}

icetbr avatar Jan 24 '22 18:01 icetbr

Nice. The limitation here is that the expected value (b) must be a pojo with no method at all, not to confuse the deep-equal algo, is not it ? Does not seem a problem to me though

lorenzofox3 avatar Jan 25 '22 08:01 lorenzofox3

Honestly, I've completely abandoned this idea.

I tried to implement something, and it quickly grew in complexity - it felt like I was taking away responsibilities from the language, something that always tends to spiral out of control.

Just adding a helper function to validate some fields and return the rest (above) is super simple and explicit, and has worked out extremely well with 100+ integration tests using these patterns already.

I'm happy and productive without this feature. 😄

I will let you decide, but I'd be happy to just close the issue.

(it might be interesting to put something like a manual or developer guide on the docket instead? that's a separate issue, but patterns like these are empowering - developers retain freedom from any constraints or limitations that more framework would impose, and I think that's in tune with zora's testing philosophy? I would love to see this take off at scale.)

mindplay-dk avatar Jan 26 '22 08:01 mindplay-dk

Thank you for that feedback. I use your approach in my production projects, so my expectDb handles updatedAt, createdAt, _id fields for instance.

I started playing around with async matchers when testing Jest. It takes a different route. Instead of this generic expectDb for all savable structures, each structure will have its own custom set of matchers. Which can be part of a base object, of course.

I think if you can use Jest's assert lib it would be enough. I haven't tried yet with Zora. I use Jest's expect over Chai in some pet Mocha projects. It has a bunch of nice to haves but not essential goodies, like a better diff for large objects for instance.

All these fringe beneficial features makes for a better DX, at the cost of bloating the lib. Jest has ~1500 open issues.

icetbr avatar Jan 26 '22 12:01 icetbr