vektonn-client-python
vektonn-client-python copied to clipboard
Optimize DTOs json serialization
test_perf_serialize run shows that VektonnBaseModel.json()
is a huge bottleneck:
----------------------------- Captured stdout call -----------------------------
28285893 function calls in 25.275 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
118351 15.430 0.000 21.607 0.000 dtos.py:17(json)
118351 3.108 0.000 3.522 0.000 test_dtos_perf.py:55(to_idp_fast)
12781908 2.509 0.000 2.509 0.000 {built-in method _abc._abc_instancecheck}
12781908 2.122 0.000 4.630 0.000 abc.py:96(__instancecheck__)
118351 0.691 0.000 0.691 0.000 {orjson.dumps}
118351 0.414 0.000 0.414 0.000 {method 'tolist' of 'numpy.ndarray' objects}
118351 0.189 0.000 0.552 0.000 typing.py:802(__getitem__)
118351 0.130 0.000 0.317 0.000 typing.py:255(inner)
236702 0.113 0.000 0.1[65](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:65) 0.000 <frozen importlib._bootstrap>:389(parent)
118351 0.076 0.000 0.815 0.000 utils.py:13(orjson_dumps)
23[67](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:67)02 0.075 0.000 0.104 0.000 typing.py:329(__hash__)
1 0.071 0.071 3.605 3.605 test_dtos_perf.py:42(construct)
1 0.063 0.063 21.671 21.671 test_dtos_perf.py:37(serialize)
118351 0.059 0.000 0.083 0.000 typing.py:720(__hash__)
355053 0.052 0.000 0.052 0.000 {built-in method builtins.hash}
236[70](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:70)2 0.051 0.000 0.051 0.000 {method 'rpartition' of 'str' objects}
11[83](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:83)51 0.047 0.000 0.047 0.000 {method 'decode' of 'bytes' objects}
236702 0.027 0.000 0.027 0.000 {built-in method builtins.isinstance}
118351 0.019 0.000 0.019 0.000 {built-in method builtins.len}
118351 0.016 0.000 0.016 0.000 typing.py:1149(cast)
118351 0.012 0.000 0.012 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 cProfile.py:133(__exit__)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The main reason is that pydantic separately handles each value from nested collections. In this case, we have significant overhead due to the field "coordinates" of the "Vector" model. In this field, pydandic processes each int inside the list field.
To solve this problem, you can write your own dict converter. For example, this mixin adds custom dict convertation to "Vector" model:
class ToDictMixin:
def dict(
self,
*,
by_alias: bool = False,
exclude_none: bool = False,
**kwargs,
) -> dict:
return {
self.__fields__[field_name].alias if by_alias else field_name: value
for field_name, value in self
if value is not None or not exclude_none
}
class Vector(ToDictMixin, VektonnBaseModel):
...
With such a trick we get:
----------------------------- Captured stdout call -----------------------------
4260640 function calls in 7.623 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
118351 2.693 0.000 3.057 0.000 test_dtos_perf.py:55(to_idp_fast)
118351 2.360 0.000 4.423 0.000 dtos.py:16(json)
118351 0.734 0.000 0.734 0.000 {orjson.dumps}
118351 0.364 0.000 0.364 0.000 {method 'tolist' of 'numpy.ndarray' objects}
710106 0.191 0.000 0.191 0.000 {built-in method _abc._abc_instancecheck}
118351 0.183 0.000 0.542 0.000 typing.py:802(__getitem__)
710106 0.164 0.000 0.355 0.000 abc.py:96(__instancecheck__)
118351 0.134 0.000 0.312 0.000 typing.py:255(inner)
118351 0.110 0.000 0.204 0.000 dtos.py:48(dict)
118351 0.077 0.000 0.077 0.000 dtos.py:55(<dictcomp>)
1 0.074 0.074 3.144 3.144 test_dtos_perf.py:42(construct)
118351 0.074 0.000 0.[85](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:85)9 0.000 utils.py:25(orjson_dumps)
236702 0.069 0.000 0.098 0.000 typing.py:329(__hash__)
118351 0.061 0.000 0.085 0.000 <frozen importlib._bootstrap>:3[89](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:89)(parent)
118351 0.057 0.000 0.0[79](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:79) 0.000 typing.py:720(__hash__)
1 0.057 0.057 4.479 4.479 test_dtos_perf.py:37(serialize)
355053 0.051 0.000 0.051 0.000 {built-in method builtins.hash}
11[83](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:83)51 0.051 0.000 0.051 0.000 {method 'decode' of 'bytes' objects}
236702 0.029 0.000 0.029 0.000 {built-in method builtins.isinstance}
118351 0.024 0.000 0.024 0.000 {method 'rpartition' of 'str' objects}
118351 0.019 0.000 0.019 0.000 {built-in method builtins.len}
118351 0.017 0.000 0.017 0.000 typing.py:1149(cast)
118351 0.016 0.000 0.016 0.000 {method 'items' of 'dict' objects}
118351 0.012 0.000 0.012 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 cProfile.py:133(__exit__)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}