pyrsistent icon indicating copy to clipboard operation
pyrsistent copied to clipboard

Faster implementation of maps

Open itamarst opened this issue 5 years ago • 7 comments

I recently had to rip out some pyrsistent code because using it was 10× slower than regular Python data structures. It's probably possible to do better than current implementation.

Some possibilities:

  • https://github.com/MagicStack/immutables (based on code in Python 3.7—can't figure out if the Python code is exposed publicly.)
  • https://github.com/arximboldi/immer

itamarst avatar Apr 07 '19 12:04 itamarst

Yeah, it's a well known fact that the pmap is a lot slower than the corresponding built in map. I would not use it for any type of performance critical part of a system. Even though it is based on a pvector (which is a relatively quick C-implementation) there is a lot of python code executing for each operation.

I'd love to see a C-implementation or an implementation with a very thin wrapper on top of an existing C/C++ implementation like one of the linked ones included in pyrsistent. Writing an implementation from scratch in C requires a fair effort though. Adopting one of the existing implementations may be an option, but making it compatible with the current implementation and the different inherited types would require some thinking.

Happy to take PRs on this! It's unlikely that I will engage in such an effort at this time since I don't have a personal need for it and have very limited time for hobby coding (which this would be) at the moment.

tobgu avatar Apr 15 '19 19:04 tobgu

Definitely understand you don't have time for this, just thought it'd be good to have somewhere to track this.

FWIW, Immer seems to support the fancier stuff Pyrsistent does (temporarily-mutable objects), so I suspect it could be made to work with less trouble.

itamarst avatar Apr 24 '19 16:04 itamarst

Erlang added some new implementations of some of their data structures a couple of years ago, the same is used for dictionary and sets.

http://www1.erlang.org/doc/man/gb_trees.html

Prof. Arne Andersson's General Balanced Trees

http://user.it.uu.se/~arnea/abs/gbimpl.html

You find more on http://user.it.uu.se/~arnea/publications.html

mattiasw2 avatar Apr 24 '19 18:04 mattiasw2

Thanks to both of you for the pointers above!

tobgu avatar Apr 25 '19 20:04 tobgu

Hi! Immer author here :) Transients for maps are being implemented right now. I do believe that Immer is the fastest implementation of HAMTs and RRBTS out there (I tested the later claim at a ICFP paper, where I also benchmarked it against the C-backed Pyrsistent vectors). I would be very happy to have Immer become the backend for Pyrsistent, and would love to help with reviewing the code!

arximboldi avatar May 31 '22 10:05 arximboldi

Also keen on this. Is https://github.com/MagicStack/immutables an option for this as a backend @tobgu? It's already a Python package so would be easier to manage. Otherwise would be happy to try and integrate immer but my C++ is very rusty so would need some pointers @arximboldi .

samuelsinayoko avatar Jan 17 '24 09:01 samuelsinayoko

@samuelsinayoko Immer does include in it source tree some experiments binding immer::vector to Python (and some benchmarks with Pyrsistent I think) you can check it out here: https://github.com/arximboldi/immer/tree/master/extra/python

For Maps there may be some other subtleties, but I think that could serve as good inspiration.

Immer maps now support transients (very efficiently!) and also support structural-sharing-aware diffing, which has lots of super cool applications. Also Immer vectors support logarithmic concatenatio and slicing, features that Pyrsistent currently does not support. So I think using Immer as a backend for Pyrsistent is actually a fantastic idea!

arximboldi avatar Jan 17 '24 15:01 arximboldi