eclipse-collections
eclipse-collections copied to clipboard
Infinite loop in probeThree
We saw a case where org.eclipse.collections.impl.map.mutable.primitive.LongObjectHashMap#probeThree caused an infinite loop.
I was unable to reproduce it directly but managed a reflective case how the situation would look.
Looking at the code it seems an infinite loop can be caused if;
nextIndex = this.mask(nextIndex + spreadTwo);
Results into nextIndex == nextIndex
Simple reflective test case would be;
@Test
public void testCaseUsingRefleciton()
throws NoSuchFieldException, IllegalAccessException, NoSuchMethodException, InvocationTargetException {
final LongObjectHashMap<Object> objects = new LongObjectHashMap<>();
final Field keys = LongObjectHashMap.class.getDeclaredField("keys");
keys.setAccessible(true);
keys.set(objects, new long[]{0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L,
1337L});
final Method probeThree = LongObjectHashMap.class.getDeclaredMethod("probeThree", long.class, int.class);
probeThree.setAccessible(true);
probeThree.invoke(objects, 1, 0);
}
This test case is not valid because the keys.length
is 17, which is not a power of 2.
nextIndex + spreadTwo
should be able to probe all of the array if spreadTwo
is odd, which is guaranteed with the | 1
in the construction of spreadTwo
.
Are you sure your (original) map was not in an illegal state, because of something else, e.g. incorrect multi-threading?
The upperlying library in this case is mapdb;
Stacktrace of the case was;
at org.eclipse.collections.impl.map.mutable.primitive.LongObjectHashMap.probeThree(LongObjectHashMap.java:3104)
at org.eclipse.collections.impl.map.mutable.primitive.LongObjectHashMap.probeTwo(LongObjectHashMap.java:3080)
at org.eclipse.collections.impl.map.mutable.primitive.LongObjectHashMap.probe(LongObjectHashMap.java:3057)
at org.eclipse.collections.impl.map.mutable.primitive.LongObjectHashMap.getIfAbsent(LongObjectHashMap.java:2362)
at org.eclipse.collections.impl.map.mutable.primitive.LongObjectHashMap.get(LongObjectHashMap.java:2340)
at org.mapdb.StoreWAL.longStackLoadChunk(StoreWAL.kt:741)
at org.mapdb.StoreWAL.longStackPut(StoreWAL.kt:723)
at org.mapdb.StoreDirectAbstract.releaseData(StoreDirectAbstract.kt:367)
at org.mapdb.StoreWAL.linkedRecordDelete(StoreWAL.kt:330)
at org.mapdb.StoreWAL.updateProtected(StoreWAL.kt:449)
at org.mapdb.StoreWAL.update(StoreWAL.kt:426)
at org.mapdb.HTreeMap.putprotected(HTreeMap.kt:415)
at org.mapdb.HTreeMap.put(HTreeMap.kt:324)
The library uses locking to access the map.
Wouldn't it be an option to replace the while true with a finite iteration?
probeThree
has to return a valid index. just stopping the iteration doesn't help with that. If we stop the iteration, the best we could do is to throw some sort of diagnostic exception, because right now, we don't understand what can cause this. It would be best to fix the underlying issue...
@koenr-bc may be if possible have a failing test case of the actual issue. If need be you can reverse engineer the reflective test. But at a surface level this smells like a multiple threads corrupting the map. If we get a failing test case then we can fix and release it as a part of our next release happening in a month or two. Thanks!
@koenr-bc what version of EC was in that stack trace? FYI, there is a potential way probeThree
can go into an infinite loop that was fixed/released over 2 years ago in version 10.0.0.
The version used is 10.4.0.
I think in this case even a diagnostic exception would be better than in infinite loop in my opinion.
Edit: added an issue in mapdb: https://github.com/jankotek/mapdb/issues/997#issue-989681614
I had a look at mapdb code. Two things to note:
- That code (StoreWAL.kt) no longer exists, but there is no new release after it was removed.
- If I'm looking at the correct source code (https://github.com/jankotek/mapdb/blob/release-3.0/src/main/java/org/mapdb/StoreWAL.kt), I have a hard time reasoning about the lock correctness, because the code uses many locks (in the superclass StoreDirectAbstract, Line 33 and 34) and the EC map seems to be accessed in various states of lock.
Given the above, I think it's fair to require a reproducible test case before proceeding.