swift-collections
swift-collections copied to clipboard
Incubate persistent data structures
This work-in-progress draft PR kicks the tires for incubating persistent data structures into the swift-collections project.
A long-term goal of this effort is to mature persistent replacements for the general-purpose data structures already present in the Swift Standard Library (i.e., Array, Set and Dictionary).
Currently, the PR covers the HashMap<Key: Hashable, Value> data type, which is supposed to become a persistent drop-in replacement for Swift's Dictionary. The draft implementation is neither feature complete nor performance optimized yet, but aims to create visibility for this effort and to elicit early feedback. Working title for the module is Capsule, based on prior work.
HashMap<Key: Hashable, Value> implements the Compressed Hash-Array Mapped Prefix-tree (CHAMP) data structure. The implementation is inspired by previous JVM data structures that I've authored in the past, amongst them:
- Scala's
HashMapandHashSet(Scala; default in standard library since Scala 2.13) - The Capsule Hash Trie Collections Library (Java; standalone library)
Feel free to provide feedback on idiomatic Swift usage and Swift-specific performance optimization tips and tricks, if and where appropriate.
References and Further Readings
Publications
- PhD Thesis: Efficient Immutable Collections (2017)
- Paper: Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections (OOPSLA 2015)
- Paper: To-Many or To-One? All-in-One! Efficient Purely Functional Multi-maps with Type-Heterogeneous Hash-Tries (PLDI 2018)
Talks
- JVM Language Summit 2016 - Efficient and Expressive Immutable Collections (Speaker: Michael Steindorfer)
- JVM Language Summit 2017 - Lightweight Relations (Speaker: Michael Steindorfer)
Checklist
- [x] I've read the Contribution Guidelines
- [x] My contributions are licensed under the Swift license.
- [ ] I've followed the coding style of the rest of the project.
- [ ] I've added tests covering all new code paths my change adds to the project (if appropriate).
- [ ] I've added benchmarks covering new functionality (if appropriate).
- [x] I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
- [ ] I've updated the documentation if necessary.
This is exciting work. I think these would be a great fit for this package, and I'm personally very much interested in getting these landed (& eventually added to the stdlib). We'll need to figure out how to test & benchmark these, and carefully evaluate API decisions. I think there is a bunch of interesting low-level optimization work we can do on the internal representations (things like replacing Array<Any> storage with tail-allocated typed buffers).
(I think it may make sense to open an official feature branch for this project. I'll ponder how best to coordinate work, and I'll see about scheduling resources for getting this done.)
Hi @lorentey, great that we share the same interest and vision regarding this effort.
(I think it may make sense to open an official feature branch for this project. I'll ponder how best to coordinate work, and I'll see about scheduling resources for getting this done.)
An official feature branch sounds good to me. Keep me up to date on how we should progress incubating the persistent collections.
(The draft PR was a suggestion by @kylemacomber and was actually a good thing for getting the draft implementation out and creating momentum. Though it may not be ideal acting as a long running feature branch.)
We'll need to figure out how to test & benchmark these, and carefully evaluate API decisions.
Sure. I plan to add more test coverage (and eventually benchmarks) for the persistent data structures along the way.
I didn't investigate the state of the conformance checkers in swift-collections yet, but having good coverage for Dictionary, Set, Array and entailed protocols would help a lot.
I'll also reach out to discuss API decisions and idiomatic Swift usage when time is due, but first I'll work on dialing in the data structure design (e.g., what concepts translate best to Swift) and feature completeness.
I think there is a bunch of interesting low-level optimization work we can do on the internal representations
The same goes for low-level optimizations. I'm looking forward to tackle those as well, but first the data structure design needs to settle. Feel free to make suggestions anytime if you happen to notice something.
things like replacing Array<Any> storage with tail-allocated typed buffers
The Array<Any> store disguises a lot of complexity, since its content is type-heterogeneous, akin to several (differently) typed buffers concatenated (therefore elements are currently typed as Any).
Can you provide me some references on what you had in mind? I will then have a look at it.
Oh man, this is great~! Fantastic to have your expertise here @msteindorfer! 🥳
I regularly miss persistent data structures so this'll be a lovely and very welcome addition!
@lorentey @msteindorfer is there anything we in The Community can do to help get this along?
@kielgillard, this contribution will be actively worked on to be readied for inclusion. Stay tuned.
I'm landing this on a feature branch to start integration & performance work. 💯