marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Optimise dumping to reduce unnecessary overhead

Open dsimidzija opened this issue 3 years ago • 5 comments

NB: Please take this MR more as a "starting a discussion" than "production-ready" code, because I hacked a lot of things.

When dumping many objects, marshmallow is calling the same field methods over and over again, which return the same values. Parts of this process can be called only once per dump, which reduces python method call overhead significantly.

Field.get_serializer returns the optimized serializer for the current dump operation, avoiding the expensive lookups for properties which will not change during a single dump (such as data_key, default, etc)

Also, the default Schema.get_attribute is also not used because all it does is calling utils._get_value_for_key(s).

Benchmarks show around 30-35% improvement, which is quite significant even for this hacky patch:

T1: python benchmark.py
  Before: 395.60 usec/dump
  After:  261.04 usec/dump
T2: python benchmark.py --object-count 1000
  Before: 22508.80 usec/dump
  After:  14610.63 usec/dump
T3: python benchmark.py --iterations=5 --repeat=5 --object-count 20000
  Before: 442295.61 usec/dump
  After:  288202.98 usec/dump
T3: python benchmark.py --iterations=10 --repeat=10 --object-count 10000
  Before: 220163.94 usec/dump
  After:  142475.76 usec/dump

My motivation here is that marshmallow is excellent when it comes to schema validation, but according to the benchmarks, there is a lot of overhead in there. The question is, is there a better way of improving serialization performance without sacrificing all the good things about marshmallow?

dsimidzija avatar Aug 16 '20 22:08 dsimidzija

This lines up pretty well with the discussion in #805. I think the main difference is that we also proposed caching logic that depends on the object type. I think the scope of this PR would be a good incremental improvement.

deckar01 avatar Aug 18 '20 15:08 deckar01

I didn't even know about #805, good to know! My ideas were something similar: figure out a way to cache values for certain types at least. For example, if I know that a specific field is a string property and it is 100% there, there should be no reason to have complicated lookups or enforcing types. I didn't have time to delve deeper into the various Field subclasses, but line_profiler implies a lot of repetition there.

As I'm typing this, I wonder if it would be possible to do something like lima is doing, and compile a single function which dumps the entire object, where each Field subclass could in theory produce the "most optimized" serializer for itself.

dsimidzija avatar Aug 19 '20 00:08 dsimidzija

I don't know if this is of any interest, but with some minimal changes, marshmallow can be cythonized with setuptools-cythonize and together with this MR, the performance is around ~45% better!

dsimidzija avatar Mar 16 '21 19:03 dsimidzija

I'm not familiar with cythonize. Does cythonization itself - without this PR - bring a significant improvement?

Is there anything we could do to make it available more easily? Like, could/should we distribute binary packages? Would it be useful to many users or is it a niche?

lafrech avatar Mar 16 '21 19:03 lafrech

I tested it without this MR a long time ago, so I don't have the numbers, but IIRC it did bring around 10% of speed on its own. But I guess that should be examined in more detail.

I'm honestly not sure how niche it is, I started looking into it because I ran into these performance problems when dumping large(ish) datasets with marshmallow, but didn't want to give up dynamic schemas & validation that it provides. I haven't had the time yet, but I was hoping to look into speeding up individual fields as well, I feel like there is a lot of overhead there which should be easy to eliminate without breaking anything.

dsimidzija avatar Mar 16 '21 20:03 dsimidzija