vektonn-client-python icon indicating copy to clipboard operation
vektonn-client-python copied to clipboard

Optimize DTOs json serialization

Open AndrewKostousov opened this issue 3 years ago • 2 comments

test_perf_serialize run shows that VektonnBaseModel.json() is a huge bottleneck:

----------------------------- Captured stdout call -----------------------------
         28285893 function calls in 25.275 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   118351   15.430    0.000   21.607    0.000 dtos.py:17(json)
   118351    3.108    0.000    3.522    0.000 test_dtos_perf.py:55(to_idp_fast)
 12781908    2.509    0.000    2.509    0.000 {built-in method _abc._abc_instancecheck}
 12781908    2.122    0.000    4.630    0.000 abc.py:96(__instancecheck__)
   118351    0.691    0.000    0.691    0.000 {orjson.dumps}
   118351    0.414    0.000    0.414    0.000 {method 'tolist' of 'numpy.ndarray' objects}
   118351    0.189    0.000    0.552    0.000 typing.py:802(__getitem__)
   118351    0.130    0.000    0.317    0.000 typing.py:255(inner)
   236702    0.113    0.000    0.1[65](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:65)    0.000 <frozen importlib._bootstrap>:389(parent)
   118351    0.076    0.000    0.815    0.000 utils.py:13(orjson_dumps)
   23[67](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:67)02    0.075    0.000    0.104    0.000 typing.py:329(__hash__)
        1    0.071    0.071    3.605    3.605 test_dtos_perf.py:42(construct)
        1    0.063    0.063   21.671   21.671 test_dtos_perf.py:37(serialize)
   118351    0.059    0.000    0.083    0.000 typing.py:720(__hash__)
   355053    0.052    0.000    0.052    0.000 {built-in method builtins.hash}
   236[70](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:70)2    0.051    0.000    0.051    0.000 {method 'rpartition' of 'str' objects}
   11[83](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:83)51    0.047    0.000    0.047    0.000 {method 'decode' of 'bytes' objects}
   236702    0.027    0.000    0.027    0.000 {built-in method builtins.isinstance}
   118351    0.019    0.000    0.019    0.000 {built-in method builtins.len}
   118351    0.016    0.000    0.016    0.000 typing.py:1149(cast)
   118351    0.012    0.000    0.012    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:133(__exit__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

AndrewKostousov avatar Feb 06 '22 10:02 AndrewKostousov

The main reason is that pydantic separately handles each value from nested collections. In this case, we have significant overhead due to the field "coordinates" of the "Vector" model. In this field, pydandic processes each int inside the list field.

To solve this problem, you can write your own dict converter. For example, this mixin adds custom dict convertation to "Vector" model:

class ToDictMixin:
    def dict(
        self,
        *,
        by_alias: bool = False,
        exclude_none: bool = False,
        **kwargs,
    ) -> dict:
        return {
            self.__fields__[field_name].alias if by_alias else field_name: value
            for field_name, value in self
            if value is not None or not exclude_none
        }

class Vector(ToDictMixin, VektonnBaseModel):
    ...

BrandesDenis avatar Feb 24 '22 15:02 BrandesDenis

With such a trick we get:

----------------------------- Captured stdout call -----------------------------
         4260640 function calls in 7.623 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   118351    2.693    0.000    3.057    0.000 test_dtos_perf.py:55(to_idp_fast)
   118351    2.360    0.000    4.423    0.000 dtos.py:16(json)
   118351    0.734    0.000    0.734    0.000 {orjson.dumps}
   118351    0.364    0.000    0.364    0.000 {method 'tolist' of 'numpy.ndarray' objects}
   710106    0.191    0.000    0.191    0.000 {built-in method _abc._abc_instancecheck}
   118351    0.183    0.000    0.542    0.000 typing.py:802(__getitem__)
   710106    0.164    0.000    0.355    0.000 abc.py:96(__instancecheck__)
   118351    0.134    0.000    0.312    0.000 typing.py:255(inner)
   118351    0.110    0.000    0.204    0.000 dtos.py:48(dict)
   118351    0.077    0.000    0.077    0.000 dtos.py:55(<dictcomp>)
        1    0.074    0.074    3.144    3.144 test_dtos_perf.py:42(construct)
   118351    0.074    0.000    0.[85](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:85)9    0.000 utils.py:25(orjson_dumps)
   236702    0.069    0.000    0.098    0.000 typing.py:329(__hash__)
   118351    0.061    0.000    0.085    0.000 <frozen importlib._bootstrap>:3[89](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:89)(parent)
   118351    0.057    0.000    0.0[79](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:79)    0.000 typing.py:720(__hash__)
        1    0.057    0.057    4.479    4.479 test_dtos_perf.py:37(serialize)
   355053    0.051    0.000    0.051    0.000 {built-in method builtins.hash}
   11[83](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:83)51    0.051    0.000    0.051    0.000 {method 'decode' of 'bytes' objects}
   236702    0.029    0.000    0.029    0.000 {built-in method builtins.isinstance}
   118351    0.024    0.000    0.024    0.000 {method 'rpartition' of 'str' objects}
   118351    0.019    0.000    0.019    0.000 {built-in method builtins.len}
   118351    0.017    0.000    0.017    0.000 typing.py:1149(cast)
   118351    0.016    0.000    0.016    0.000 {method 'items' of 'dict' objects}
   118351    0.012    0.000    0.012    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:133(__exit__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

AndrewKostousov avatar Feb 25 '22 08:02 AndrewKostousov