eclipse-collections icon indicating copy to clipboard operation
eclipse-collections copied to clipboard

Proposal to add HashedArrayTrees as an alternative mutable list

Open slaymaker1907 opened this issue 6 years ago • 2 comments

The data structure I am proposing to add is described here: https://en.wikipedia.org/wiki/Hashed_array_tree.

The reason why I think this is probably worth for inclusion is for a couple of reasons:

  • It would allow for very large dynamic arrays since it is a two dimensional structure. This allows for up to 2^62 size arrays on 64 bit systems (for Intel, this is capped out at 2^48 so it would allow for using the entire address space).
  • The memory overhead is optimal and is only O(sqrt(n)) versus O(n) for either a linked list or array list.
  • With this structure, you can deallocate memory as you iterate through it similar to a linked list which can greatly reduce memory usage for code which uses iterators heavily.
  • In the common case where you build a list through add(), the hashed array tree does far fewer deallocations and copies than the array list.
  • The hashed array tree has much better memory locality than a linked list.

I'm willing to implement this and do a pull request for it, but since this would be a big addition, I wanted to get feedback from the Eclipse Collections team and the community first.

slaymaker1907 avatar Jan 19 '19 22:01 slaymaker1907

Do you want this for object references or primitives? Some initial reactions:

  • I think by "memory overhead", you meant "average wasted space".
  • The name is confusing. There is no hashing here. It's just an array of arrays and from the user's perspective, it's nothing more than a LargeList.
  • The random access performance is worse than ArrayList.
  • Java doesn't have a "consuming" or "discarding" iterator, so getting any perceived gains from freeing-while-iterating, especially with the GC only doing frees occasionally, is dubious.
  • It's very rare to need such a huge list. What's the usecase?
    • If you want this for small lists, is it because of (a) issues with wasted space (have you tried trim()?), and/or (b) because you think the add performance will be that much better?

mohrezaei avatar Jan 19 '19 22:01 mohrezaei

Regarding deallocating memory while iterating, I assume that refers to calling remove() once for each call to next().

I spent a good deal of time on a similar idea, implementing a Scapegoat Tree as a memory-efficient alternative to TreeSet and TreeSortedSet.

I think these alternatives are certainly valid, and potentially worthwhile. It relies on having a use-case for repeated mutation and also memory pressure. That's because for mutable lists (ArrayList, FastList) and mutable sorted sets (TreeSet, TreeSortedSet) you'll get far better memory savings by converting it to an immutable implementation than by using an alternative mutable implementation.

motlin avatar Jan 20 '19 04:01 motlin