pyrsistent
pyrsistent copied to clipboard
Faster implementation of maps
I recently had to rip out some pyrsistent code because using it was 10× slower than regular Python data structures. It's probably possible to do better than current implementation.
Some possibilities:
- https://github.com/MagicStack/immutables (based on code in Python 3.7—can't figure out if the Python code is exposed publicly.)
- https://github.com/arximboldi/immer
Yeah, it's a well known fact that the pmap is a lot slower than the corresponding built in map. I would not use it for any type of performance critical part of a system. Even though it is based on a pvector (which is a relatively quick C-implementation) there is a lot of python code executing for each operation.
I'd love to see a C-implementation or an implementation with a very thin wrapper on top of an existing C/C++ implementation like one of the linked ones included in pyrsistent. Writing an implementation from scratch in C requires a fair effort though. Adopting one of the existing implementations may be an option, but making it compatible with the current implementation and the different inherited types would require some thinking.
Happy to take PRs on this! It's unlikely that I will engage in such an effort at this time since I don't have a personal need for it and have very limited time for hobby coding (which this would be) at the moment.
Definitely understand you don't have time for this, just thought it'd be good to have somewhere to track this.
FWIW, Immer seems to support the fancier stuff Pyrsistent does (temporarily-mutable objects), so I suspect it could be made to work with less trouble.
Erlang added some new implementations of some of their data structures a couple of years ago, the same is used for dictionary and sets.
http://www1.erlang.org/doc/man/gb_trees.html
Prof. Arne Andersson's General Balanced Trees
http://user.it.uu.se/~arnea/abs/gbimpl.html
You find more on http://user.it.uu.se/~arnea/publications.html
Thanks to both of you for the pointers above!
Hi! Immer author here :) Transients for maps are being implemented right now. I do believe that Immer is the fastest implementation of HAMTs and RRBTS out there (I tested the later claim at a ICFP paper, where I also benchmarked it against the C-backed Pyrsistent vectors). I would be very happy to have Immer become the backend for Pyrsistent, and would love to help with reviewing the code!
Also keen on this. Is https://github.com/MagicStack/immutables an option for this as a backend @tobgu? It's already a Python package so would be easier to manage. Otherwise would be happy to try and integrate immer but my C++ is very rusty so would need some pointers @arximboldi .
@samuelsinayoko Immer does include in it source tree some experiments binding immer::vector
to Python (and some benchmarks with Pyrsistent I think) you can check it out here: https://github.com/arximboldi/immer/tree/master/extra/python
For Maps there may be some other subtleties, but I think that could serve as good inspiration.
Immer maps now support transients (very efficiently!) and also support structural-sharing-aware diffing, which has lots of super cool applications. Also Immer vectors support logarithmic concatenatio and slicing, features that Pyrsistent currently does not support. So I think using Immer as a backend for Pyrsistent is actually a fantastic idea!