jsonpickle icon indicating copy to clipboard operation
jsonpickle copied to clipboard

performance issues compared to cPickle

Open jvsteiner opened this issue 11 years ago • 7 comments

I am finding the sizes for jsonpickle are about double that of cPickle, and performance is dubious as well although I haven't run any exact metrics yet - even when using ujson as a backend. any tips on optimizing?

jvsteiner avatar Dec 10 '14 19:12 jvsteiner

The comparison is a bit unfair. The pickle protocol is a binary representation of the data, whereas jsonpickle uses a human-friendly, text-based representation -- the overhead of the quoting and braces itself will consume quite a few bytes. It is a trade-off, while there could be optimizations for size, it may make the output significantly more complicated. You may consider chaining the output of jsonpickle into the gzip module -- I suspect that will reduce it to near-pickle sizes.

I'm unaware of any attempts to optimize the codebase for speed or prior benchmarking efforts. It may not be a bad idea if you find some code hotspots to implement those improvements. Also note that cPickle is a C-based optimization of the pure-Python pickle module.

johnpaulett avatar Dec 10 '14 20:12 johnpaulett

Hi John, thanks for the reply. I didn't mean to be unfair - I fully understand the benefits of having a human readable format vs. binary - it's why I am trying it out. I guess ujson is probably about the fastest module I can use to serialize things this way so I will just have to run some hard metrics to see if the tradeoff is acceptable. To be even more fair, I could probably reduce the size by using shorter attribute names in my classes etc. as well... Also zipping the output actually makes the files much smaller than pickle files in my case. Thanks for the great module!

jvsteiner avatar Dec 10 '14 20:12 jvsteiner

@jvsteiner -- no worries, I understand. If you get some benchmarking numbers (size and speed compared to cPickle, pickle, & jsonpickle with various backends) that seems like it would be a beneficial thing to know and share!

The size issue may very well be solved via a quick zlib call. It reminds me of the MongoDB BSON issue of key sizes directly impacting database size (this was from years ago, not sure if still around) -- a tradeoff between readability and size.

All the thanks goes to the other contributors throughout the years -- they really took a small library and turned it into something cool!

johnpaulett avatar Dec 10 '14 20:12 johnpaulett

did some quick benchmarks. For an ~767k JSON blob I used timeit to measure execution times for 10 repetitions, here's what I got:

jsonpickle with various backends: ujson: 3.89 sec simplejson: 4.04 sec json: 3.96 sec

regular pickling: cPickle: 0.382 sec pickle: 3.27 sec

jvsteiner avatar Dec 10 '14 21:12 jvsteiner

JSON pickling is going to be slower than the binary protocol, that's the tradeoff. What's the surprise here? If you're really after the speed, you should use something like msgpack:

>>> %timeit jsonpickle.encode(d)
10 loops, best of 3: 21.3 ms per loop

>>> %timeit cPickle.dumps(d)
100 loops, best of 3: 3.23 ms per loop

>>> %timeit msgpack.dumps(d)
100 loops, best of 3: 2.14 ms per loop

(in this example d is this 148KB JSON file)

aldanor avatar Jan 09 '15 00:01 aldanor

no surprise - just exploring the details of the tradeoff...

jvsteiner avatar Jan 09 '15 06:01 jvsteiner

Not sure if anyone else here remembers this exists, however I've been trying to improve the speed of jsonpickle over the past few weeks. While I've gotten some big scores with speeding up both encode and decode (around 2x faster in 1.5.1 than it was in 0.8.0), they're still pretty slow compared to cPickle. However, I think if someone here is good at C or C++ and willing to help write a C extension for jsonpickle for the heavy-duty stuff, it could be vastly sped up, probably to within the same order of magnitude of cPickle, though it'll never actually be as fast as cPickle. Any takers/other ideas?

Theelx avatar Feb 03 '21 01:02 Theelx