'rehash()' should use '==' not 'equals()'
Performance
UnifiedSet rehash operation will reuse UnifiedSet.add. Then, the T.equals will be triggered during hash collision.
https://github.com/eclipse-collections/eclipse-collections/blob/6256e7c298a2bb7a839c2b5941416a3f8795cf5a/eclipse-collections/src/main/java/org/eclipse/collections/impl/set/mutable/UnifiedSet.java#L337
I'm working with 100 million of objects. In my real experience, the equals() call causes the cache miss and cause 10%~30% performance degeneration.
Reason
During rehash, we already guarantee the object is distinct. The equals might trigger memory access on member of objects.
Change
My suggestion is we need to have new method nonNullTableObjectSame during rehash to replace with
https://github.com/eclipse-collections/eclipse-collections/blob/6256e7c298a2bb7a839c2b5941416a3f8795cf5a/eclipse-collections/src/main/java/org/eclipse/collections/impl/set/mutable/UnifiedSet.java#L2342
private boolean nonNullTableObjectSame(Object cur, T key)
{
return cur == key || (cur == NULL_KEY ? key == null : (cur == key);
}
And, change add usage in rehash.
@Override
protected void rehash(int newCapacity)
{
int oldLength = this.table.length;
Object[] old = this.table;
this.allocate(newCapacity);
this.occupied = 0;
for (int i = 0; i < oldLength; i++)
{
Object oldKey = old[i];
if (oldKey instanceof ChainedBucket)
{
ChainedBucket bucket = (ChainedBucket) oldKey;
do
{
if (bucket.zero != null)
{
this.rehashAdd(this.nonSentinel(bucket.zero));
}
if (bucket.one == null)
{
break;
}
this.rehashAdd(this.nonSentinel(bucket.one));
if (bucket.two == null)
{
break;
}
this.rehashAdd(this.nonSentinel(bucket.two));
if (bucket.three != null)
{
if (bucket.three instanceof ChainedBucket)
{
bucket = (ChainedBucket) bucket.three;
continue;
}
this.rehashAdd(this.nonSentinel(bucket.three));
}
break;
}
while (true);
}
else if (oldKey != null)
{
this.rehashAdd(this.nonSentinel(oldKey));
}
}
}
I guess the interesting observation here is that during rehash, we're already guaranteed that no two objects can be equal, so there should be no equal checking at all. rehashAdd should simply chain whatever comes in to that bucket.
are we agreeing on the proposed solution, so that I can take it
@gargsuraj12 feel free to submit a PR. #1734 was opened against this issue but doesn't actually include any changes.
Hi, is this issue still open? If unresolved, I'd like to try fixing it.
@lzcyx got for it, it's still unresolved!