FSharpx.Collections CHAMP - Compressed Hash-Array Mapped Prefix Tree

Possible addition to FSharpx.Collections?

Abstract The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs).HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala’s and Clojure’s data structure implementations in terms of memory footprint and runtime efficiency of iteration (1.3–6.7 x) and equality checking (3–25.4 x).

http://michael.steindorfer.name/publications/oopsla15.pdf

Nov 29 '15 18:11 cloudRoutine

marking as up-for-grabs

Nov 30 '15 17:11 forki

A ClojureScript implementation is in the work. Maybe it's helpful... or not :)

Lean Hash Array Mapped Trie implementation in ClojureScript

It will be presented at Clojure/west 2016 by Peter Schuck.

Feb 17 '16 03:02 JeffHeon

Peter Schuck's Clojure/west 2016 presentation is now on Youtube:

https://www.youtube.com/watch?v=GibNOQVelFY

May 18 '16 02:05 rmunn

The Java implementation of the CHAMP data structure, by Steindorfer and Vinju (the authors of the http://michael.steindorfer.name/publications/oopsla15.pdf paper):

https://github.com/usethesource/capsule

Another implementation, by someone else, in Kotlin: https://github.com/norswap/triemap

Jun 01 '17 05:06 rmunn

Out of interest - do you really expect a CHAMP to perform particularly well in .NET? My initial thought would be that due to the necessity of holding keys, values, and nodes in the same obj array the boxing / unboxing cost would be very high for non ref-type keys/values.

Aug 18 '17 08:08 simoncowen88

Particularly well? Not unless the boxing/unboxing problem you mention can be solved.

Better than the current PersistentHashMap? Absolutely; it suffers from the exact same boxing/unboxing issue for value types, so converting it to a CHAMP implementation should be a significant performance gain over the existing implementation.

Aug 18 '17 09:08 rmunn

CHAMP conceptually splits each object[] into two sections, so it could actually use two arrays, one Node<T>[] and one T[]. This would prevent boxing/unboxing in exchange for the memory cost of having an extra array at each Node<T> (although it could be null if empty).

Jun 04 '18 04:06 in-op