eclipse-collections
eclipse-collections copied to clipboard
refactor bad smell ToArrayCallWithZeroLengthArrayArgument
Repairing Code Style Issues
ToArrayCallWithZeroLengthArrayArgument
The performance of the empty array version is the same, and sometimes even better, compared
to the pre-sized version. Also, passing a pre-sized array is dangerous for a concurrent or
synchronized collection as a data race is possible between the size
and toArray
calls. This may result in extra null
s at the end of the array if the collection was concurrently
shrunk during the operation.
Repairing Code Style Issues
- ToArrayCallWithZeroLengthArrayArgument (1)
The linked performance data has some serious issues. The largest collection size tested is 1000. The data shows a clear trend where the sized variant is getting faster and faster (relative to the zero), but they stop short of where it gets interesting. At 1M, I would bet it's entirely reversed.
Also, the tests are run with a 5 year old JDK, with a GC that's no longer default. The benefits of the zero method (eliding zeroing) is something that may be optimized in newer JDKs.
My recommendation: re-run the benchmarks with much larger sizes and newer JDK's before using it to change production code.
Sorry, I didn't realize this was limited to sizes 4 or less. That's likely much harder to optimize in a meaningful way. The most relevant line from the benchmark is this:
ToArrayBench.sized 10 hashset avgt 30 43.808 ± 0.629 ns/op
ToArrayBench.zero 10 hashset avgt 30 44.192 ± 0.823 ns/op
So there is no measurable difference between the two(significant overlap of avg with error), so all this PR is doing is causing more garbage creation.
While the performance argument may not be convincing, the point about toArray(emptyArray) being an atomic way of dealing with concurrent collections is compelling to me.
The statement "passing a pre-sized array is dangerous for a concurrent or synchronized collection" seems to contain a misunderstanding of the contract of toArray
. Here is the javadoc:
Returns an array containing all of the elements in this collection; the runtime type of the returned array is that of the specified array. If the collection fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this collection. If this collection fits in the specified array with room to spare (i.e., the array has more elements than this collection), the element in the array immediately following the end of the collection is set to null. (This is useful in determining the length of this collection only if the caller knows that this collection does not contain any null elements.)
Most of this is handled in java.util.AbstractCollection.toArray
(which is the ultimate parent of say, org.eclipse...ConcurrentHashMap.KeySet
):
public <T> T[] toArray(T[] a) {
// Estimate size of array; be prepared to see more or fewer elements
int size = size();
T[] r = a.length >= size ? a :
(T[])java.lang.reflect.Array
.newInstance(a.getClass().getComponentType(), size);
... // more resizing checks
Please also note the specifics of this code: the very first thing it does is call size()
and allocate an array if you pass in a zero length array. No synchronization.
I have this impression that on concurrent collections, the only method that can conveniently and atomically give you O(n) elements that were at least in the collection at some point is toArray(emptyArray). I'm looking around and not finding this documented anywhere. I don't see it on a Javadoc contract or in Effective Java or JCIP. But it's clear that java.util.concurrent.ConcurrentHashMap.CollectionView#toArray() was written this way. It returns a trimmed array with no nulls at the end.
on java.util.concurrent.ConcurrentHashMap
calling toArray(new X[0])
and toArray(new X[c.size()])
will return identical results, as the function, once again, allocates new X[c.size()]
in the first case. So it's trimmed regardless (which seems unnecessary, given the contract of the function).
I am seeing different behaviors for different sizes.
public static void main(String[] args)
{
testWithPreSize(0);
testWithPreSize(2);
}
private static void testWithPreSize(int preSize)
{
ConcurrentHashMap<String, Integer> map = new java.util.concurrent.ConcurrentHashMap<String, Integer>();
map.put("One", 1);
Object[] sizeTwoArray = new Object[preSize];
Object[] result = map.entrySet().toArray(sizeTwoArray);
System.out.println("Result is " + Arrays.toString(result) + " for pre-sized to " + preSize);
}
Prints:
Result is [One=1] for pre-sized to 0
Result is [One=1, null] for pre-sized to 2
I'm sure you know this though, so I'm trying to guess where our disconnect is. I think part of it might be, why would the size ever be greater than the number of entries (2 here).
In org.eclipse.collections.impl.map.immutable.ImmutableMapFactoryImpl#withAll
in the expression map.entrySet().toArray(new Map.Entry[map.entrySet().size()])
I'm supposing that map.entrySet().size()
returns N and because of a concurrent removal, toArray() returns an array with N-1 entries plus a null at the end.
Sorry, I misread the code: it only trims if the array sent in was too small. So it honors the contract of the function as specified in Collection
.
So the caller has two choices:
-
toArray(new X[c.size()])
(or evensize()*1.1
): put stuff into my array. Don't reallocate unless you need space. I'll handle nulls, if any, at the end, because I'm trying to get this done with as little GC as possible. -
toArray(new X[0])
: allocate a new array based on the initial size of the collection. Trim it if entries were removed and couldn't be found to fill it.
The guarantees around the entries in the array are so weak (it may have values that are no longer in the map, and values that are missing) that I'm not sure "it can have nulls, but it'll allocate less" is a real issue.
I agree that this isn't a real issue in practice. ConcurrentHashMap doesn't allow null keys nor null values, so if keySet().toArray(preSizedArray)
or values().toArray(preSizedArray)
results contain nulls, it unambiguously means that the map shrunk.
And map.entrySet().toArray(preSizedArray)
is unambiguous since we can distinguish between null and an Entry containing null. This even works for maps like HashMap which allow nulls.
Result is [null=null] for pre-sized to 0
Result is [null=null, null] for pre-sized to 2
If anyone ever ran into this issue in practice, it would manifest as a NullPointerException inside ImmutableMapFactory.withAll(). If we want to fix that, I think the 0 size thing would work. If that's a performance problem, another option might be:
Map.Entry<K, V>[] entries = map.entrySet().toArray(new Map.Entry[map.entrySet().size()]);
if (entries.length == 0 || entries[0] == null)
{
return this.empty();
}
if (entries.length == 1 || entries[1] == null)
{
return this.of(entries[0].getKey(), entries[0].getValue());
}
if (entries.length == 2 || entries[2] == null)
{
return this.of(
entries[0].getKey(), entries[0].getValue(),
entries[1].getKey(), entries[1].getValue());
}
if (entries.length == 3 || entries[3] == null)
{
return this.of(
entries[0].getKey(), entries[0].getValue(),
entries[1].getKey(), entries[1].getValue(),
entries[2].getKey(), entries[2].getValue());
}
if (entries.length == 4 || entries[4] == null)
{
return this.of(
entries[0].getKey(), entries[0].getValue(),
entries[1].getKey(), entries[1].getValue(),
entries[2].getKey(), entries[2].getValue(),
entries[3].getKey(), entries[3].getValue());
}
return new ImmutableUnifiedMap<>(map);
That works. A bit less "or-y":
int realLength = entries.length;
while (realLength > 0 && entries[realLength - 1] == null) realLength--;
Then switch on realLength.