vavr icon indicating copy to clipboard operation
vavr copied to clipboard

Explore reimplementing HashMap as CHAMP (Compressed Hash-Array Mapped Prefix-tree)

Open pivovarit opened this issue 6 years ago • 20 comments

Related Epic: https://github.com/vavr-io/vavr/issues/2308

CompressedHAMP is designed to provide a smaller footprint and better cache locality than standard HAMT which features one array encoding both sub-node references and data values.

To improve things we can split the array into two dedicated ones: one for values, and one for sub-node references and use two 32-bit bitmaps.

Another idea to explore is if it would make sense to give up physical tree structure in favor of logical with one array backing the whole map. Would allow decreasing pointer chasing but would penalize modifications.

We should follow up and investigate if it makes sense to do the same move as Scala did in 2.13.

https://github.com/scala/collection-strawman/issues/192 https://github.com/scala/scala/commit/d5ae93e1b35fac4002cfbffff9ec741356549e08

References:

  • https://blog.acolyer.org/2015/11/27/hamt/
  • https://michael.steindorfer.name/publications/oopsla15.pdf

pivovarit avatar Jun 13 '19 12:06 pivovarit

We should consider to port it from usethesource/capsule. Here the author gave an overview over data structures.

danieldietrich avatar Jul 23 '19 21:07 danieldietrich

Great, I was aware of the library - wasn't aware of that thread. Definitely helpful!

btw, do you think it would make sense to do a push before 1.0.0? 🤔

pivovarit avatar Jul 24 '19 05:07 pivovarit

@pivovarit I plan to include it in 2.0.0, which is already in the make but in a very early alpha state. Collections aren't part of it, currently.

I also plan to get my hands on the essential data structures BMT, Red/Black Tree and CHAMP. I'm not sure if it makes sense that you invest time.

danieldietrich avatar Aug 02 '19 12:08 danieldietrich

I made an experimental (and very dirty) port of CHAMP collections in a fork of vavr: https://github.com/wrandelshofer/vavr#readme

I decided to implement linked collections with sequence numbers. This is much simpler to do, than keeping a linked list or a vector in sync with the CHAMP trie. However performance is a mixed bag. Most operations are quite fast, except accesses to the first/last entry of the collection - which are linear in time. Also, occasionally, we need to renumber the sequence numbers, which may cause unexpected slow-downs. Memory efficiency should be quite good though.

So, maybe my fork is just a demonstration of how not to do it. 😂

wrandelshofer avatar Jun 19 '22 17:06 wrandelshofer

I made some progress now.

I have now implemented four different CHAMP-based collections:

  • vavr.ChampMap
  • vavr.SequencedChampMap
  • vavr.ChampSet
  • vavr.SequencedChampSet

Here is the current README file. It shows a performance comparison with vavr.HashMap, vavr.LinkedHashMap and scala.HashMap, scala.VectorMap, scala.TreeSeqMap.

https://github.com/wrandelshofer/vavr/blob/6569c67e3d5c3b4dae681fa762c714c8d074d7ab/README.md

The performance of the sequenced collections (the ones that preserve the insertion order) is now O(1) for all operations. There are no performance cliffs anymore.

The code is in a very rough draft state. Let me know, what you think about it.

wrandelshofer avatar Apr 02 '23 19:04 wrandelshofer

Hi @wrandelshofer, the benchmarks look really impressive! From my point of view, you are welcome to create a PR that replaces the according Vavr collections (keeping their names).

danieldietrich avatar Apr 20 '23 21:04 danieldietrich

Thank you. I am going to create a PR. 😀

wrandelshofer avatar Apr 22 '23 14:04 wrandelshofer

Awesome, I am really looking forward to this great improvement!

danieldietrich avatar Apr 22 '23 16:04 danieldietrich

Please, do not worry. I am working hard on the PR. So far I have converted 2 collections. I have 2 more to go. 😅

The code I am working on is in this branch: https://github.com/wrandelshofer/vavr/tree/champ-dev

(No need to look at it. It is in disarray.)

What are your plans with respect to Java 21? It would be a shame, if I downgraded all my code to Java 8, and would then have to upgrade to 21 again. 🚧 🏗️ 👷‍♂️

wrandelshofer avatar Apr 30 '23 15:04 wrandelshofer

I have now made a pull request. See https://github.com/vavr-io/vavr/pull/2745

wrandelshofer avatar May 11 '23 15:05 wrandelshofer