DeepCloner icon indicating copy to clipboard operation
DeepCloner copied to clipboard

Cloning Dictionaries and HashSets may lead to unpredictable behavior

Open pferencgit opened this issue 1 year ago • 7 comments

I saw previous discussions about weird, unpredictable behaviors regarding Dictionaries and HashSets cloned using DeepCloner. I looked into how DeepCloner processed the cloning of such objects and I saw that the inner structure of Dictionaries and HashSets are getting cloned the same way as all other objects.

The cause of the problem is that the inner structure of these objects implement an index that rely on buckets / slots created based on hash codes of the objects stored as entries. In case the stored objects are replaced with their clones, the index may become invalid.

For example, in case of dictionaries the cloned dictionary's internal field "entries" contains hashcodes of the original objects. Therefore the function "ContainsKey" will not work correctly in the cloned dictionary.

pferencgit avatar Oct 19 '24 07:10 pferencgit

I know this issue. When object in Dictionary/Has does not override GetHashCode or external comparer is not used in dictionary, default hash code of object is used. It just it memory reference. So, after cloning, there are another hashcode and dictionary became broken.

It can be solved by hardcoding cloning of Dictionary and HashSet due filling new objects, but there are some disadvantages of this:

  1. It is slower
  2. In most cases it useless, direct cloning are produce same results
  3. There are can be another implementations (like ConcurrentDictionary or some experimental OrderedDictionary which should be handled also).
  4. In some cases it will break some logic for testing. E.g. when you remove objects from Dictionary, it does not reduce his size, so, when you add lot of objects and remove lot of objects - this dictionary can be worser than clean dictionary. In some cases it can change performance metrics and cloning into new, clean, Dictionary can change this behavior.

But I understood, that better to make stable behavior, so, when I find good solution and time to implement this, I'll do that

force-net avatar Nov 22 '24 09:11 force-net

Thanks for the feedback. I can confirm, the problem affects those Dictionaries / HashSets that identify objects based on reference equality.

Since your Manual (50 ns) vs DeepCloner (570 ns) performance test showed a great potential for improvement, I did some tests with manually cloning Dictionaries and HashSets. As it turned out, the current DeepCloner method is faster than creating and populating new Dictionaries / HashSets, but it can only be considered reliable if the stored hashcodes are not based on the default Object.GetHashCode method used for reference equality.

The only optimization I can currently think of is to differentiate the cloning of Dictionaries / HashSets that identify objects based on reference equality and those that use some custom generated hash code.

pferencgit avatar Nov 25 '24 06:11 pferencgit

Is upstream open to accepting a PR for this? I have a solution + tests for both various types of dictionaries and sets here: https://github.com/lofcz/DeepCloner/commit/55b52599a00fde38a41e2ab667760e5ff6b52773

A minimal repro for this is:

public class KeyClass
{
    public string Value { get; set; }
}

[Test]
public void DictionaryBrokenAfterCloningTest()
{
    // Arrange
    var originalDict = new Dictionary<KeyClass, string>();
    var key = new KeyClass { Value = "TestKey" };
    originalDict[key] = "TestValue";

    // Act
    var clonedDict = originalDict.DeepClone();
    var clonedKey = clonedDict.Keys.First();

    // Assert
    Assert.Multiple(() =>
    {
        Assert.That(ReferenceEquals(key, clonedKey), Is.False);
        Assert.That(key.Value, Is.EqualTo(clonedKey.Value));

        Assert.That(originalDict.ContainsKey(key), Is.True);
        Assert.That(clonedDict.ContainsKey(clonedKey), Is.True); // important

        Assert.That(clonedKey.Value is "TestKey", Is.True);
    });
}

lofcz avatar Jan 07 '25 16:01 lofcz

I've added more tests (and implementation) for concurrent dictionaries, ordered dictionaries, read-only dictionaries, various sets, and specializations into FastCloner. If upstream is interested I can backport it! ⭐

lofcz avatar Jan 08 '25 05:01 lofcz

I've added more tests (and implementation) for concurrent dictionaries, ordered dictionaries, read-only dictionaries, various sets, and specializations into FastCloner. If upstream is interested I can backport it! ⭐

A backport would be very welcome if .NET FW compatibility is feasible. Alternatively, how far FastCloner is from being .NET FW compatible?

pferencgit avatar Jan 08 '25 06:01 pferencgit

~~I'm afraid it's pretty far. As I said, I'm open to backporting however I'd like my PR not to rot, so a word from the maintainer would be needed if they want this solved and are open to accepting a patch.~~

lofcz avatar Jan 08 '25 06:01 lofcz

FastCloner now supports anything never and including net46, with full netstandard2.0 support.

lofcz avatar Jun 07 '25 07:06 lofcz