jupyter_client icon indicating copy to clipboard operation
jupyter_client copied to clipboard

perf: speedup date parsing using ciso8601

Open maartenbreddels opened this issue 5 years ago • 6 comments

This is a cutout of running the profiler on voila, where it handles a single comm msg: image

As can be seen, the green 'parse_date' is a significant part of the cpu usage.

With this PR, is shrinks this quite a bit: image

This uses ciso8601 to do the datetime parsing, which is significantly faster

Note that is not a requirements, it will only be used when installed. However, 1 test regarding timezones fails (it should fail to convert and it does not, I'm not sure how important that is)

Running it in a benchmark

Before:

--------------------------------------------------------- benchmark: 1 tests --------------------------------------------------------
Name (time in us)                     Min       Max      Mean   StdDev    Median      IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance     226.2670  556.7280  272.2735  58.3634  249.3170  19.2865   291;378        3.6728    2251           1
-------------------------------------------------------------------------------------------------------------------------------------

After:

-------------------------------------------------------- benchmark: 1 tests -------------------------------------------------------
Name (time in us)                    Min         Max     Mean   StdDev   Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance     59.9780  4,237.0830  76.7926  60.7056  67.9120  4.8000   190;974       13.0221    5717           1
-----------------------------------------------------------------------------------------------------------------------------------

maartenbreddels avatar Nov 17 '20 12:11 maartenbreddels

Relying on the regex gives about the same performance, but should let the tests pass:

------------------------------------------------------- benchmark: 1 tests ------------------------------------------------------
Name (time in us)                    Min       Max     Mean   StdDev   Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance     57.6940  597.6290  71.3625  32.5174  61.1040  3.7240   413;661       14.0130    3586           1
---------------------------------------------------------------------------------------------------------------------------------

maartenbreddels avatar Nov 17 '20 12:11 maartenbreddels

In fact, I think we can assume we know where the date fields are right?

That gives us the following performance:

------------------------------------------------------- benchmark: 1 tests ------------------------------------------------------
Name (time in us)                    Min       Max     Mean   StdDev   Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance     27.2420  201.6740  32.8866  11.3219  29.4945  1.3950   520;798       30.4076    5266           1
---------------------------------------------------------------------------------------------------------------------------------

And when using orjson instead of json, we can get:

-------------------------------------------------------- benchmark: 1 tests -------------------------------------------------------
Name (time in us)                    Min         Max     Mean   StdDev   Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------
test_deserialize_performance     21.9820  3,148.4180  35.3785  76.8630  23.5860  5.6985    50;395       28.2658    1939           1
-----------------------------------------------------------------------------------------------------------------------------------

This is 10x faster.

maartenbreddels avatar Nov 17 '20 15:11 maartenbreddels

Will you include your changes on orjson in this PR? Or a separate one?

Thanks for working on this!

martinRenou avatar Nov 19 '20 11:11 martinRenou

Well, jupyter_client users from zmq.utils import jsonapi which seems like a smart idea, but orjson does not support seperator, which is used in that module. And I'm also not sure if orjson supports ensure_ascii=False, allow_nan=False,

maartenbreddels avatar Nov 19 '20 11:11 maartenbreddels

Well, jupyter_client users from zmq.utils import jsonapi

This is a legacy from when we supported Pythons that didn't even have json in the standard library and there were fiddly issues with str vs bytes. There's no need to keep that, as long as we use something that's producing valid utf8 JSON bytes.

minrk avatar Dec 09 '20 08:12 minrk

Hi @maartenbreddels! We're pushing for a 7.0 release, do you want to pick this back up?

blink1073 avatar Jul 30 '21 09:07 blink1073