eclipse-collections icon indicating copy to clipboard operation
eclipse-collections copied to clipboard

'rehash()' should use '==' not 'equals()'

Open morris821028 opened this issue 1 year ago • 4 comments

Performance

UnifiedSet rehash operation will reuse UnifiedSet.add. Then, the T.equals will be triggered during hash collision.

https://github.com/eclipse-collections/eclipse-collections/blob/6256e7c298a2bb7a839c2b5941416a3f8795cf5a/eclipse-collections/src/main/java/org/eclipse/collections/impl/set/mutable/UnifiedSet.java#L337

I'm working with 100 million of objects. In my real experience, the equals() call causes the cache miss and cause 10%~30% performance degeneration.

Reason

During rehash, we already guarantee the object is distinct. The equals might trigger memory access on member of objects.

Change

My suggestion is we need to have new method nonNullTableObjectSame during rehash to replace with

https://github.com/eclipse-collections/eclipse-collections/blob/6256e7c298a2bb7a839c2b5941416a3f8795cf5a/eclipse-collections/src/main/java/org/eclipse/collections/impl/set/mutable/UnifiedSet.java#L2342

private boolean nonNullTableObjectSame(Object cur, T key)
    {
        return cur == key || (cur == NULL_KEY ? key == null : (cur == key);
    }

And, change add usage in rehash.

    @Override
    protected void rehash(int newCapacity)
    {
        int oldLength = this.table.length;
        Object[] old = this.table;
        this.allocate(newCapacity);
        this.occupied = 0;

        for (int i = 0; i < oldLength; i++)
        {
            Object oldKey = old[i];
            if (oldKey instanceof ChainedBucket)
            {
                ChainedBucket bucket = (ChainedBucket) oldKey;
                do
                {
                    if (bucket.zero != null)
                    {
                        this.rehashAdd(this.nonSentinel(bucket.zero));
                    }
                    if (bucket.one == null)
                    {
                        break;
                    }
                    this.rehashAdd(this.nonSentinel(bucket.one));
                    if (bucket.two == null)
                    {
                        break;
                    }
                    this.rehashAdd(this.nonSentinel(bucket.two));
                    if (bucket.three != null)
                    {
                        if (bucket.three instanceof ChainedBucket)
                        {
                            bucket = (ChainedBucket) bucket.three;
                            continue;
                        }
                        this.rehashAdd(this.nonSentinel(bucket.three));
                    }
                    break;
                }
                while (true);
            }
            else if (oldKey != null)
            {
                this.rehashAdd(this.nonSentinel(oldKey));
            }
        }
    }

morris821028 avatar Dec 08 '24 03:12 morris821028

I guess the interesting observation here is that during rehash, we're already guaranteed that no two objects can be equal, so there should be no equal checking at all. rehashAdd should simply chain whatever comes in to that bucket.

mohrezaei avatar Dec 08 '24 05:12 mohrezaei

are we agreeing on the proposed solution, so that I can take it

gargsuraj12 avatar Feb 28 '25 09:02 gargsuraj12

@gargsuraj12 feel free to submit a PR. #1734 was opened against this issue but doesn't actually include any changes.

motlin avatar Mar 02 '25 14:03 motlin

Hi, is this issue still open? If unresolved, I'd like to try fixing it.

lzcyx avatar Jun 18 '25 06:06 lzcyx

@lzcyx got for it, it's still unresolved!

motlin avatar Jul 01 '25 23:07 motlin