FSharpx.Collections
FSharpx.Collections copied to clipboard
CHAMP - Compressed Hash-Array Mapped Prefix Tree
Possible addition to FSharpx.Collections?
Abstract The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs).HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala’s and Clojure’s data structure implementations in terms of memory footprint and runtime efficiency of iteration (1.3–6.7 x) and equality checking (3–25.4 x).
http://michael.steindorfer.name/publications/oopsla15.pdf
marking as up-for-grabs
A ClojureScript implementation is in the work. Maybe it's helpful... or not :)
Lean Hash Array Mapped Trie implementation in ClojureScript
It will be presented at Clojure/west 2016 by Peter Schuck.
Peter Schuck's Clojure/west 2016 presentation is now on Youtube:
https://www.youtube.com/watch?v=GibNOQVelFY
The Java implementation of the CHAMP data structure, by Steindorfer and Vinju (the authors of the http://michael.steindorfer.name/publications/oopsla15.pdf paper):
https://github.com/usethesource/capsule
Another implementation, by someone else, in Kotlin: https://github.com/norswap/triemap
Out of interest - do you really expect a CHAMP to perform particularly well in .NET?
My initial thought would be that due to the necessity of holding keys, values, and nodes in the same obj array
the boxing / unboxing cost would be very high for non ref-type keys/values.
Particularly well? Not unless the boxing/unboxing problem you mention can be solved.
Better than the current PersistentHashMap? Absolutely; it suffers from the exact same boxing/unboxing issue for value types, so converting it to a CHAMP implementation should be a significant performance gain over the existing implementation.
CHAMP conceptually splits each object[]
into two sections, so it could actually use two arrays, one Node<T>[]
and one T[]
. This would prevent boxing/unboxing in exchange for the memory cost of having an extra array at each Node<T>
(although it could be null if empty).