jupyter_client
jupyter_client copied to clipboard
perf: speedup date parsing using ciso8601
This is a cutout of running the profiler on voila, where it handles a single comm msg:

As can be seen, the green 'parse_date' is a significant part of the cpu usage.
With this PR, is shrinks this quite a bit:

This uses ciso8601 to do the datetime parsing, which is significantly faster
Note that is not a requirements, it will only be used when installed. However, 1 test regarding timezones fails (it should fail to convert and it does not, I'm not sure how important that is)
Running it in a benchmark
Before:
--------------------------------------------------------- benchmark: 1 tests --------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance 226.2670 556.7280 272.2735 58.3634 249.3170 19.2865 291;378 3.6728 2251 1
-------------------------------------------------------------------------------------------------------------------------------------
After:
-------------------------------------------------------- benchmark: 1 tests -------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance 59.9780 4,237.0830 76.7926 60.7056 67.9120 4.8000 190;974 13.0221 5717 1
-----------------------------------------------------------------------------------------------------------------------------------
Relying on the regex gives about the same performance, but should let the tests pass:
------------------------------------------------------- benchmark: 1 tests ------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance 57.6940 597.6290 71.3625 32.5174 61.1040 3.7240 413;661 14.0130 3586 1
---------------------------------------------------------------------------------------------------------------------------------
In fact, I think we can assume we know where the date fields are right?
That gives us the following performance:
------------------------------------------------------- benchmark: 1 tests ------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance 27.2420 201.6740 32.8866 11.3219 29.4945 1.3950 520;798 30.4076 5266 1
---------------------------------------------------------------------------------------------------------------------------------
And when using orjson instead of json, we can get:
-------------------------------------------------------- benchmark: 1 tests -------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance 21.9820 3,148.4180 35.3785 76.8630 23.5860 5.6985 50;395 28.2658 1939 1
-----------------------------------------------------------------------------------------------------------------------------------
This is 10x faster.
Will you include your changes on orjson in this PR? Or a separate one?
Thanks for working on this!
Well, jupyter_client users from zmq.utils import jsonapi which seems like a smart idea, but orjson does not support seperator, which is used in that module. And I'm also not sure if orjson supports ensure_ascii=False, allow_nan=False,
Well, jupyter_client users from zmq.utils import jsonapi
This is a legacy from when we supported Pythons that didn't even have json in the standard library and there were fiddly issues with str vs bytes. There's no need to keep that, as long as we use something that's producing valid utf8 JSON bytes.
Hi @maartenbreddels! We're pushing for a 7.0 release, do you want to pick this back up?