HashSet contains method return false after deep clone
var clonedHashSet = new HashSet<object>{new object()}.DeepClone();
var clonedObject = clonedHashSet.First();
clonedHashSet.Contains(clonedObject); // false
clonedHashSet.Clear();
clonedHashSet.Add(clonedObject);
clonedHashSet.Contains(clonedObject); // true
This is complex issue, related to missing GetHashCode implementation for object. Without implementation — object reference is used. So, after cloning clonedObject has another reference and GetHashCode for it returns another value.
In real situation correct implementation of GetHashCode is very recommended. I'll think about possible variants of mitigation of this problem.
Ran into this issue after switching to DeepCloner from BinaryFormatter. BinaryFormatter had the same fundamental issue, but I was able to work around it by using the [OnDeserialized] hook like this:
public class MyObject<T> : IList<T>
{
private HashSet<T> _uniqueItemTracker = new HashSet<T>();
//Omitting IList<T> implementation
[OnDeserialized]
private void OnDeserialized(StreamingContext context)
{
//After cloning, OnDeserialized invoked and the _uniqueItemTracker is re-populated by creating a new HashSet off the current contents
_uniqueItemTracker = new HashSet<T>(this);
}
}
This doesn't solve the fundamental problem, but a post-DeepClone hook would at least allow us to write code to fix it ourselves.
Currently, I'm trying to find best solution to mitigate this issue. E.g. this code will not work by design (we clone set entirely but check with non-cloned object).
var key = new object();
var set = new HashSet<object> {key};
var cloned = set.DeepClone();
Assert.That(cloned.Contains(key)); // failed
This case can occur in any complex cloning and cannot be solved without custom handlers.
For you case, you can wrap set with new set, but EqualityComparer can be passed to HashSet (so, we need to pass it to new Set). Also, there are a lot of Dictionaries and Sets implementations with custom behavior, so, it hard to make correct general cloning.
Is there any update on this? There's some weird things going on with strings as well.
Note that this only fails on .net 6.0 and passes on netcoreapp3.1. It could probably have something to do with an optimization in HashSet<T> for when T == typeof(string).
var set = new HashSet<string> { "value" };
set.Contains("value").Should().BeTrue(); // pass
var cloned = set.DeepClone();
cloned.Contains("value").Should().BeTrue(); // pass
var copyOfSet = new HashSet<string>(set, set.Comparer);
copyOfSet.Contains("value").Should().BeTrue(); // pass
var copyOfCloned = new HashSet<string>(cloned, cloned.Comparer);
(copyOfCloned.ToArray()[0] == "value").Should().BeTrue(); // pass
copyOfCloned.Contains("value").Should().BeTrue(); // fail
If I hack together a separate processing method for HashSetnew HashSet<string>(objFrom, objFrom.Comparer), this test works. so it's probably only related to strings...
It's really strange, but interesting issue. When HashSet created it checks:
- is comparers are same (for strings it should be true, but there are lot of internal magic to create Comparer.Default) - in this case it just copies internal HashSet objects
- if it differs code like to
new HashSet(comparer).UnionWith(source.AsEnumerable())
In both cases result should be identical. So, I'll try to reproduce problem and found a source of this issue.
I found an issue and fix it (will do some more tests and will publish new version in some days).
Problem only with strings (but another code can also cause similar problems).
HashSetcomparer == EqualityComparer<string>.Default - and it fails for cloned comparer. So, HashSet began to use GenericEqualityComparer instead of default and it produces different hash code
So, it real .NET magic and it needs to be analyzed.
Yeah I saw that too, not sure what's exactly going on there. Looking forward to a new release. I've tested a workaround where in the deep cloner code I detect instances of HashSetnew HashSet<string>(source, source.Comparer) to create the clone, this seems to work pretty good.