fsharp Slow performance of Set.intersects when comparing two sets of different sizes

I have found that Set.intersect have greatly different performance depending on the order of the arguments passed.

Here is a reproduction sample

type SetIntersectBenchmark() =
    let longSet = [ for _ in 1 .. 1_000_000 -> Random.Shared.Next() ] |> Set.ofList
    let shortSet = Set.ofList [1;2;3;4;5;6;7;8;9;10]

    [<Benchmark>]
    member _.intersect_long_set_with_short_set() = Set.intersect longSet shortSet
    
    [<Benchmark(Baseline=true)>]
    member _.intersect_short_set_with_long_set() = Set.intersect shortSet longSet

Expected behavior

Set.intersect a b should produce the same result as Set.intersect b a. I would expect the performance to be equal.

Actual behavior

Performance differ by several magnitudes

Method	Job	Arguments	IterationCount	LaunchCount	WarmupCount	Mean	Error	StdDev	Median	Ratio	RatioSD	Allocated	Alloc Ratio
intersect_long_set_with_short_set	Job-BBQTFN	/p:BUILDING_USING_DOTNET=true	Default	Default	Default	113,804,493.5 ns	3,993,161.34 ns	11,584,885.09 ns	113,180,500.0 ns	210,173.97	29,869.00	-	0.00
intersect_short_set_with_long_set	Job-BBQTFN	/p:BUILDING_USING_DOTNET=true	Default	Default	Default	547.0 ns	18.29 ns	52.78 ns	525.3 ns	1.00	0.00	40 B	1.00
intersect_long_set_with_short_set	Job-XIHKCK	/p:BUILDING_USING_DOTNET=true	Default	Default	Default	107,596,475.3 ns	6,650,312.63 ns	19,504,215.74 ns	100,039,800.0 ns	199,488.09	41,392.25	-	0.00
intersect_short_set_with_long_set	Job-XIHKCK	/p:BUILDING_USING_DOTNET=true	Default	Default	Default	602.0 ns	21.11 ns	61.25 ns	577.7 ns	1.11	0.16	40 B	1.00
intersect_long_set_with_short_set	Job-MMDXPR	Default	2	2	1	104,123,106.2 ns	132,585,779.86 ns	20,517,796.20 ns	104,504,000.0 ns	199,902.65	30,687.64	-	0.00
intersect_short_set_with_long_set	Job-MMDXPR	Default	2	2	1	566.6 ns	268.16 ns	41.50 ns	569.6 ns	1.10	0.15	40 B	1.00

Proposed change

Due to how comparing two sets are implemented; loop through all elements in the first set, check if it is present in the second set, then add to intersection set, if the first set is huge, then we will have to loop through every item in the set, performing potentially hundred of thousands of comparisons. On the other hand, looping 10 times, checking for membership in a list of 1000000 elements is trivial.

I believe Big O of intersect to be O(n_a log n_b), which indicates that the order of operations will be for small a and large b around 135 operations, while large a and small b around 2.25 million operations.

To improve the performance, simply compare the sizes of the sets passed in and flip the comparison if it is more performant.

Dec 05 '25 11:12 akselkvitberg

Imagine a comparison implemented in a way where it checks only some fields, but not all. Should Set.intersect a b return objects from a,b or should this be by intention an undefined behavior?

When it comes to docs, this particular scenario is not mentioned. So in this case, it would be a breaking change that has to be documented and announced, but one that could be taken if the benefits outweigh the risks. To see what I mean, these are to docs of LINQ Intersect, it already gives a promise about the behavior (more of an illustration. We are not bound to LINQ in F# and the idiomatic coding styles have different preferences for structural vs custom comparison/equality in C# vs F#)

The intersection of two sets A and B is defined as the set that contains all the elements of A that also appear in B, but no other elements.
When the object returned by this method is enumerated, Intersect yields distinct elements occurring in both sequences in the order in which they appear in first.

This proposal should then also cascade to intersectManywith similar heuristic (sort by size, start by smallest ?)

Dec 05 '25 13:12 T-Gro

By current implementation, the item from a is returned and that should probably be kept to avoid breaking changes. But there shouldn't be any reason why we cannot return the item from set a, even though we do the loop through b, right? And order of returned items shouldn't matter?

Dec 09 '25 12:12 akselkvitberg

@akselkvitberg :

Yes, that makes sense. If we can maintain that property (and add a test which checks which exact object/reference is in the result), this can for sure be accepted 👍

Dec 11 '25 08:12 T-Gro