Encoding speed issues
Great project, exactly what I'm looking for. One issue I run into though is encoding speed.
I don't expect same results as stdlib json, given your module does much more formatting. But the difference is about 50x slower in my testing. Encoding 4MB of json data takes 165 ms with stdlib and 7.85 s with compact_json. That's a huge difference and unsuitable for large objects.
-
Is there anything that can be done with current version to improve speed? Disabling certain options, overriding checks, etc.
-
Are there any code improvements that can be made to future versions to improve speed?
> python3 -m timeit -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'json.dumps (data)'
2 loops, best of 5: 164 msec per loop
> python3 -m timeit -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'fmt.serialize (data)'
1 loop, best of 5: 7.85 sec per loop
One speedup I immediately notice is all the calls to logger.debug. Even when debug output is turned off, python still has to evaluate the args and pass them to debug(). Just those debug calls alone take 2.06 seconds on my test data according to cProfile.
In my code I do this:
# top of module
DEBUG = false
def foo () :
DEBUG and debug ('some diagnostic info')
- When DEBUG is false, my debug statements never evaluate args and are never called thanks to boolean short-circuiting
- I can turn debugging on/off for a single function or context by defining
DEBUG = truelocally
Profiling results
Sat Jan 25 15 _46 _53 2025 prof/cjson.1
18661266 function calls 18295664 primitive calls in 14.658 seconds
Ordered by: cumulative time
List reduced from 1109 to 1038 due to restriction <'^ ?!_bootstrap . *$'>
List reduced from 1038 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename _lineno function
108/1 0.002 0.000 14.658 14.658 builtin : exec}
1 0.001 0.001 14.658 14.658 test.py _2 <module>
1 0.000 0.000 14.369 14.369 pypi : compact_json/formatter.py _315 serialize
318820/1 0.412 0.000 14.369 14.369 pypi : compact_json/formatter.py _335 format_element
45373/1 0.783 0.000 14.365 14.365 pypi : compact_json/formatter.py _399 format_dict
45243 0.224 0.000 8.264 0.000 pypi : compact_json/formatter.py _842 format_table_dict_list
90351 0.757 0.000 4.148 0.000 pypi : compact_json/formatter.py _619 format_list_table_row
90744 0.353 0.000 4.040 0.000 pypi : compact_json/formatter.py _370 format_list
45245 0.341 0.000 3.331 0.000 pypi : compact_json/formatter.py _1030 get_list_stats
182696 1.202 0.000 2.825 0.000 pypi : compact_json/formatter.py _146 format_value
182703 0.119 0.000 2.569 0.000 pypi : compact_json/formatter.py _372 <lambda>
182750 1.396 0.000 2.538 0.000 pypi : compact_json/formatter.py _100 update
3191214 1.163 0.000 2.065 0.000 std lib : logging/__init__.py _1412 debug
182703 0.559 0.000 2.020 0.000 pypi : compact_json/formatter.py _347 format_simple
318819 0.239 0.000 1.291 0.000 std lib : json/__init__.py _183 dumps
318819 0.342 0.000 1.052 0.000 std lib : json/encoder.py _182 encode
90354 0.606 0.000 0.947 0.000 pypi : compact_json/formatter.py _454 format_list_inline
3191214 0.902 0.000 0.902 0.000 std lib : logging/__init__.py _1677 isEnabledFor
182703 0.580 0.000 0.580 0.000 std lib : json/encoder.py _204 iterencode
45243 0.333 0.000 0.571 0.000 pypi : compact_json/formatter.py _922 format_dict_expanded