eclipse-collections icon indicating copy to clipboard operation
eclipse-collections copied to clipboard

NegativeArraySizeException encountered

Open shollander opened this issue 3 years ago • 9 comments

When loading a very large number of values into a MutableMap, a java.lang.NegativeArraySizeException is encountered. A cursory examination of the code appears to point to org.eclipse.collections.impl.map.mutable.UnifiedMap:300 where there is no guard in place to ensure that the capacity variable doesn't overflow.

https://github.com/eclipse/eclipse-collections/blob/43b9c67c2ac347e12fcfe3078507370f32080d93/eclipse-collections/src/main/java/org/eclipse/collections/impl/map/mutable/UnifiedMap.java#L298-L304

A similar exception is encountered when trying to initialize a Map with a large initialCapacity (e.g. Maps.mutable.ofInitialCapacity(500_000_000)).

shollander avatar Apr 28 '21 16:04 shollander

That's more or less by design. Neither case can complete successfully, which leaves only one option: throw an exception. So we could wax poetic about what the exception should be and end up with a nicely multi-colored bike shed, but it won't help you with your real world problem that requires an ultra-large map.

mohrezaei avatar Apr 28 '21 17:04 mohrezaei

Is the maximum size of MutableMap documented anywhere? JDK Maps have a maximum capacity of 1 << 30 (1,073,741,824).

Also, I think it would be helpful to have a useful error message so that users don't have to dig through code to figure out if there is a bug somewhere or if it's behaving as expected. Generally, a NegativeArraySizeException would indicate a bug.

shollander avatar Apr 28 '21 17:04 shollander

You sure about the JDK size? The following just worked for me in 1.8 and 11:

        HashMap<Object, Object> objectObjectHashMap = new HashMap<>((1 << 30) + 10000);
        objectObjectHashMap.put(1, 0);

        System.out.println(objectObjectHashMap.size());

In the old days, the largest array that could be constructed was somewhat JDK dependent. With the current crop of 64bit JDKs, the int-referenced arrays seem to be constructable up to some unspecified (?) limit between 1 << 30 and 1 << 31. So that's where the problem is: specifying an actual max size is hard.

mohrezaei avatar Apr 28 '21 17:04 mohrezaei

Oh, and in case it wasn't clear: 32 bit JDKs will give up long before (1 << 30.5).

mohrezaei avatar Apr 28 '21 17:04 mohrezaei

The JDK HashMap is nice, it will ignore the capacity you specify if it's too large. See HashMap.java:452:

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

The limit for arrays is pretty close to Integer.MAX_VALUE (2^31 -1). I can allocate new int[Integer.MAX_VALUE -2]. The JDK classes generally use Integer.MAX_VALUE -8 to be on the safe side. See ArrayList.java for example.

shollander avatar Apr 28 '21 17:04 shollander

I disagree on "nice" part. A map that is forced beyond a reasonable load factor will behave very badly (although with the tree based stuff that was introduced into HashMap, it'll go from an O(1) structure to O(log(n))). UnifiedMap would devolve into O(n) and that's simply not acceptable. And it's not clear to me if the JDK's implementation above 1 << 30 is even correct. A quick look at the code leaves me to believe it may have some issues: resize() never throws, threshold is set to Integer.MAX_VALUE if capacity is reached in resize() and then we have this bit of code:

        if (++size > threshold)
            resize();

What happens if size is Integer.MAX_VALUE?

If your real world problem is that you need a map with billions of entries, neither UnifiedMap, nor HashMap is a place to look for a solution.

mohrezaei avatar Apr 28 '21 18:04 mohrezaei

True, perhaps the JDK map should fail if the requested maxCapacity is too large.

You are correct that HashMap is broken beyond 1 << 30 entries. I tested this out to confirm. If more than 1 << 30 entries are put in the map, it will not throw any error and act as if it has been inserted successfully. I am not sure exactly what it does with the value, but it will insert it, and will return a value greater than 1 << 30 for map.size().

shollander avatar Apr 28 '21 21:04 shollander

Just to be clear, I don't believe the JDK HashMap is buggy until the size hits Integer.MAX_VALUE + 1. It can store the entries beyond 1 << 30, even beyond Integer.MAX_VALUE with the linked or tree structure it has. It will slow down, potentially dramatically if the key doesn't implement Comparable, beyond 1 << 30 entries.

I was thinking about your actual issue (ultra large map), and it needs a bit of out of the box thinking. First thing to deal with would be a new hash method, e.g. long longHash() for the key, or maybe a function to generate that given a key. At these scales, garbage starts to become a real problem, so most people start looking at off-heap or at least structures that are more compact, e.g. a searchable set. With so much data, access patterns also become an issue, namely a singular lookup by key may be too inefficient depending on usage patterns. At any rate, the point I'm trying to make is that if you have this problem, you'll need something more sophisticated, potentially custom built.

mohrezaei avatar Apr 28 '21 21:04 mohrezaei

Thank you for your suggestions. I think that you are correct. I think HashMap will work for my use case. I don't really need that much more than 1 << 30 entries at this point. I am doing some benchmarking, but so far looks good. I had been hoping to use MutableMap in order to reduce the memory footprint, but apparently that is not going to work.

shollander avatar Apr 29 '21 14:04 shollander