Ensure immutable collections internal data structure are compacted/trimmed to size
Something that came up in https://github.com/eclipse/eclipse-collections/issues/1393 is that when we create new immutable collections, some of them rely internally on the mutable implementation and thus can contain datastructure that are are actually bigger than the actual size of the data being stored.
Since they are immutable, we know we could trim them and it will be the most efficient in terms of memory usage.
Hash tables can be compacted if the backing array is bigger than necessary. I can think of two cases where this happens.
- We remove elements from a mutable hash table
- We allocate a large initial size and then only fill the hash table a little
I don't think either case happens when constructing an immutable hash table. I suppose the second case is possible if constructing a hash table from a collection containing duplicates, but initializing the table to the collection's full size.
@motlin indeed, for UnifiedMap and UnifiedSet, when they are created, the constructor is taking into account the size of the original collection, so the size should be ok.
I can see methods where it's not the case, e.g., ImmutableUnifiedSet.newSetWith relies on UnifiedSet.newSetWith which adds the elements without ensuring only the minimum extra allocated array are available.
There is also the case you mentioned above.
This seems not very important but in our tests, when using UnifiedMap and UnifiedSet, calling trimToSize on them could get us a decrease of 5% memory usage (I'm talking about processes that were using 30-50Go of memory).
Why not just do this in the immutable collections constructors? or is there good reasons not to? I guess the real issue is that we allocate lots of intermediary things (e.g., the factory for immutable sets will call toArray!) when creating an immutable collection and calling trimToSize will add one more reallocation.
Anyway, I don't have a very strong need for this feature as I don't use those immutable collections, I just transform my map to unmodifiable after populating and trimming them manually. I created this issue following the discussion in the other ticket :)
Maybe it's not worth it to do this change, feel free to close this :)
I see what you mean. ImmutableUnifiedSet.newSetWith(T... elements) delegates to UnifiedSet.newSetWith(T... elements) which pre-sizes the table to elements.length without knowing whether there are duplicates.
The implementation of ImmutableUnifiedSet.newSetWith(T...) or the ImmutableUnifiedSet constructor could call UnifiedSet.trimToSize() on the delegate. This call will usually do nothing, unless there were lots of duplicates in the input. I think it's harmless to add, and could save memory, so sure why not?
In cases like this, I wonder whether it would be better not to pre-size the hash table. And it makes me want to check that we never do anything crazy like call Bag.toArray().