MoreLINQ
MoreLINQ copied to clipboard
IntersectBy Operator
Similar to DistinctBy
but with the functionality of Intersect
so Something like this:
var emails = GetAllEmails();
var uniqueEmails = emails.DistinctBy(x => x.Email).ToList();
var duplicates = emails.IntersectBy(uniqueEmails, x => x.EmailAddress);
Thanks
Originally reported on Google Code with ID 63
Reported by jalchr
on 2011-02-14 09:48:06
Or ExceptBy
operator
Reported by jalchr
on 2011-02-14 10:45:32
I'm not sure there is much advantage over the alternative:
var emails = GetAllEmails();
var uniqueEmails = emails.DistinctBy(x => x.Email).ToList();
var duplicates = emails.Where(e => uniqueEmails.Contains(e.EmailAddress));
Is the increase in readability significant?
If you put Contains() inside the predicate of Where(), it will get called once per iteration, which should put execution time around O(n^2). An IntersectBy operator can be optimized to run a lot faster against large collections.
Also, the example would need to be modified to behave as "IntersectBy" rather than just "Intersect".
var items= GetAllItems();
var uniqueValues= items.Select(x => x.Property).Distinct().ToList();
var duplicateItems = items.Where(e => !uniqueValues.Contains(e.Property));
Actually, that would be m×n, but that can be solved by using a hashset instead of a list.
You are correct on both counts.
But more to the point, we're now looking at a combination of about 5 function calls that require the client to worry about implementation details for optimizing performance, which could all be replaced by one sleek extension method. This is what LINQ is for.
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(
this IEnumerable<TSource> first,
IEnumerable<TSource> second,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> keyComparer = null) {
if (first == null) throw new ArgumentNullException("first");
if (second== null) throw new ArgumentNullException("second");
if (keySelector== null) throw new ArgumentNullException("keySelector");
keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
var keys = new HashSet<TKey>(second.Select(keySelector), keyComparer);
foreach (var item in first) {
var k = keySelector(item);
if (keys.Contains(k)) {
yield return item;
keys.Remove(k);
}
}
}
@JamesFaix that has the problem of iterating the source list twice
public static IEnumerable<TSource> IntersectByImpl<TSource, TKey>(
this IEnumerable<TSource> first,
IEnumerable<TSource> second,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> keyComparer = null) {
keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
var keys = new HashSet<TKey>(keyComparer);
foreach (var item in first) {
var k = keySelector(item);
if (keys.Add(k)) {
yield return item;
keys.Remove(k);
}
}
}
A separate overload where the sequences are not of the same type (eg, the second can be IEnumerable <TKey>
) could be useful as well.
It doesn't exactly iterate the source twice, it iterates each source list once, and then does some partial iterations on the HashSet when calling Contains. I believe internally there is some Contains-type overhead when calling HashSet.Add, since it must check for duplicate values some how.
I agree, the overload is very useful. But your implementation looks like the Distinct operator. You never use second
.
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(
this IEnumerable<TSource> first,
IEnumerable<TKey> second,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> keyComparer = null) {
keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
var keys = new HashSet<TKey>(second, keyComparer);
foreach (var item in first) {
var k = keySelector(item);
if (keys.Contains(k)) {
yield return item;
keys.Remove(k);
}
}
}
The PR seems to be dead since 2016. Is it blocked for any reason? The IntersectBy
overload is very useful.
Seconded - came here after wondering why we get ExceptBy but not IntersectBy.
I think IntersectBy
would be very useful as well. Especially considering that ExceptBy
is now missing its friend :'(