frontend-interview-questions icon indicating copy to clipboard operation
frontend-interview-questions copied to clipboard

Improved intersection

Open psy-man opened this issue 7 years ago • 5 comments

Hi @bcherny.

There is a little issue in intersection solution: if the right array has any duplicates, they will appear in the result. intersection([1, 5, 4, 2], [8, 91, 4, 1, 3, 8, 91]); // [ 4, 1, 8, 91 ]

I rewrote this function to es6 using Set.

psy-man avatar Jan 16 '18 18:01 psy-man

@psy-man thanks for the contribution! Please add a test that would not have passed before, but does pass now.

bcherny avatar Jan 18 '18 19:01 bcherny

Thanks for updating the test. The bug you found is a real bug, but the solution could be improved.

If we're taking the intersection of 2 arrays of lengths M and N, the current solution runs in O(M+N) time. Your solution is substantially slower, and runs in O(M+M*N²) time. The reason is that Set#has and Set#set run in linear time (see https://www.ecma-international.org/ecma-262/6.0/#sec-set.prototype.has).

A more efficient solution might avoid Sets for their poor performance. If you want to update, I'll be happy to merge.

bcherny avatar Jan 21 '18 00:01 bcherny

Hi @bcherny

I've created a few possible solutions and tested with 2 arrays: 4646 and 1103 lengths. Could you check please Benchmarks

intersectionNew - fix for current solution intersectionNewWithSet - solution with Set intersectionNewSimple - solution without reduce and concat

For me, the solution with Sets was the most faster. We also can improve performance via comparing the lengths of arrays and start with the smaller one.

I'll add a commit after your decision.

psy-man avatar Jan 21 '18 11:01 psy-man

Is this a case of the average case actually nullifying the worse case big O?

I did read the link you provided but if it truly was the case that Set was outperformed on its most standard ops by {} it's hard to understand why ECMA would have introduced Map and Set at all.

From ECMA also:

Set objects must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection. The data structures used in this Set objects specification is only intended to describe the required observable semantics of Set objects. It is not intended to be a viable implementation model.

Here's a benchmark from three years ago: https://github.com/anvaka/set-vs-object

It seems that most browsers have optimised them beyond the performance of old school key in Object checks.

I ran across across a few different browsers, mostly seems like with the length optimisation (ie choose the smaller array to iterate through), Set performs best, with the IntersectionNewSimple second best (but not by much).

DarkPurple141 avatar Jun 05 '18 05:06 DarkPurple141

You don't need second table/set

function intersection(left, right) {
	let seen = left.reduce((seen, item) => {
		seen[item] = true
		return seen
	}, {})

	return right.filter(item => {
		const include = item in seen
		delete seen[item]
		return include
	})
}

givehug avatar Jul 26 '19 10:07 givehug