deep-eql icon indicating copy to clipboard operation
deep-eql copied to clipboard

Set and Map performance and correctness

Open BridgeAR opened this issue 7 years ago • 14 comments

I was curious how well chai would handle big sets and maps and checked a couple of things and also stumbled across this PR and I am sorry to say that these benchmarks are somewhat flawed.

The Node.js assert module was never used for comparison because the benchmark required a in-existing function that would be replaced with deep-eql in that case. The result is that the node and deep-eql results are actually both from the same library so they should be pretty much identical. However, that is not always the case and they partially differ by a big margin e.g.

buffer                               x 1,166,637 ops/sec ±6.09% (73 runs sampled)
buffer                        (node) x 611,182 ops/sec ±2.47% (65 runs sampled)
object literal                       x 160,121 ops/sec ±2.89% (81 runs sampled)
object literal                (node) x 392,734 ops/sec ±2.38% (85 runs sampled)

My main concern with these benchmarks is though, that only very simply objects are used. Having nice numbers for those is awesome but they are normally not an issue. If a complex object is used though, this can be really bad.

// Set with 10000 numbers
const actual = new Set(Array(1e4).fill(1).map((_, i) => len - i - 1))
const expected = new Set(Array(1e4).fill(1).map((_, i) => i))

This time compared to Node.js 8.4 (this time for real)

set                                  x 3.89 ops/sec ±1.26% (58 runs sampled)
set                           (node) x 1,387 ops/sec ±0.33% (138 runs sampled)

So Node.js is more then 350 times as fast in this case.

While checking Sets and Maps further I also noticed the following: When using Sets twice the work is done that is needed in general because the key and the value are both the value when returned in a forEach.

Worse though: comparing Sets or Maps with Objects does not even work properly!

const deepEql = require('./')
const a = new Set([{}, {a:5}])
const b = new Set([{a:5}, {}])
deepEql(a,b) // false

Sorting an array with objects is not possible and that is why this bug exists.

I suggest to have a look at a PR from me that would improve the situation and fix the bug as well https://github.com/nodejs/node/pull/14258.

As chai currently does not have a loose equal comparison the algorithm should be pretty straight forward (otherwise there is quite some extra handling for that).

BridgeAR avatar Aug 18 '17 22:08 BridgeAR

Thanks for the issue @BridgeAR. Definitely looks like you've uncovered some problems here. Chai is a 100% community-driven project; we'd love a PR if you can carve out the time!

meeber avatar Aug 21 '17 01:08 meeber

I can not promise that I have time for that anytime soon. I have a couple of things that I am working on at the moment.

BridgeAR avatar Aug 21 '17 02:08 BridgeAR

Writing tests for the project I'm working on, I found:

deepEql( new Set([new Set([1,2]), new Set([3,4])]) ,
         new Set([new Set([4,3]), new Set([2,1])]) ); // returns false

So I had to switch to Node's assert.deepEqual I also learned that Node was not able to compare sets properly until v8.x: https://github.com/nodejs/node/issues/13347

escKeyStroke avatar Oct 18 '17 03:10 escKeyStroke

@escKeyStroke you want that to return true? Sets are ordered, why would it return true?

keithamus avatar Oct 18 '17 09:10 keithamus

@keithamus

deepEql(new Set([1, 2]), new Set([2, 1])) // passes

This is highly inconsistent to the object version and about Sets being ordered - it is true that it is possible to iterate over the elements in insertion order but due to the nature of Map and Set I would argue it is way more logical to see those as being unordered (as Node.js currently does).

BridgeAR avatar Oct 18 '17 09:10 BridgeAR

This should be fixed then. I feel like we have had this discussion before but I'd challenge that sets should only equal with the same order. Can you give a use-case for why you'd want to compare two unordered sets?

keithamus avatar Oct 18 '17 10:10 keithamus

I personally do not have a use case at hand as I personally never needed to compare sets or maps. But in general I can imagine that people just want to collect data in a set / map and compare this with predefined values in a test and I would argue they are equal, no matter the order of the objects.

BridgeAR avatar Oct 18 '17 15:10 BridgeAR

Take my bicycle, disassemble it and put the pieces in a box. When I (or the app I'm writing, or whatever) see inside the box, I will recognize my bike no matter the order in which you put the pieces inside.

I am using Set in the sense of https://en.wikipedia.org/wiki/Set_(mathematics) "the order in which the elements of a set are listed is irrelevant", which I guess is the sense in which ECMAScript specified.

A discussion you might want to participate in: "the ECMAScript spec does not specify an OrderedSet but a similar data structure exists in multiple other languages ... the popular ImmutableJS provides an OrderedSet".
https://stackoverflow.com/questions/33089695/how-can-i-sort-an-es6-set

escKeyStroke avatar Oct 19 '17 01:10 escKeyStroke

Okay, I am enough convinced. However this may be derailing the original issue. Let's focus on fixing the original issue and hopefully fix the other ones alongside.

keithamus avatar Oct 19 '17 09:10 keithamus

One of the issues we may have here to do with performance - is that IE11 has no way to iterate on sets other than forEach. We could use feature detection here, but keep in mind we must maintain compatibility with IE11 for this.

keithamus avatar Oct 19 '17 10:10 keithamus

It's a judgement call but I agree that this module shouldn't consider insertion order when comparing deep equality of Sets and Maps.

meeber avatar Oct 20 '17 00:10 meeber

Just wanted to add that javascript objects never had iteration order defined by the spec which proved to cause a lot of bugs. This is why they specified iteration order for set and map - to prevent this conversation.

Although maps and sets have an iteration order, they are not ordered themselves and cannot be re-ordered without removing and re-inserting values. I feel it would be very unexpected to compare values by insertion order for these constructs.

olsonpm avatar Mar 01 '18 00:03 olsonpm

@keithamus about performance concerns: this is indeed bad. I would probably do a feature detection in this case. That should be easy. Another way to improve the performance for forEach for the case that they are unequal is to use try catch and throw undefined or null (note: it is important to throw that instead of an error because the stack trace is unnecessary overhead).

For the fast path you can pretty much use my implementation as it is in Node.js. Only the loose equal comparison should be removed in that case (and that makes the implementation much easier). With that implementation the complexity for object cases is lower. Primitive cases are actually O(n) in there.

BridgeAR avatar Mar 01 '18 00:03 BridgeAR

Am I missing something, or are Sets always unequal, regardless of order?

assert.equal(new Set(), new Set()) returns false for me.

nicbou avatar Jan 21 '22 14:01 nicbou