compact-json icon indicating copy to clipboard operation
compact-json copied to clipboard

Encoding speed issues

Open ed2050 opened this issue 1 year ago • 1 comments

Great project, exactly what I'm looking for. One issue I run into though is encoding speed.

I don't expect same results as stdlib json, given your module does much more formatting. But the difference is about 50x slower in my testing. Encoding 4MB of json data takes 165 ms with stdlib and 7.85 s with compact_json. That's a huge difference and unsuitable for large objects.

  1. Is there anything that can be done with current version to improve speed? Disabling certain options, overriding checks, etc.

  2. Are there any code improvements that can be made to future versions to improve speed?

> python3 -m timeit -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'json.dumps (data)'
2 loops, best of 5: 164 msec per loop

> python3 -m timeit  -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'fmt.serialize (data)'
1 loop, best of 5: 7.85 sec per loop

ed2050 avatar Jan 25 '25 15:01 ed2050

One speedup I immediately notice is all the calls to logger.debug. Even when debug output is turned off, python still has to evaluate the args and pass them to debug(). Just those debug calls alone take 2.06 seconds on my test data according to cProfile.

In my code I do this:

# top of module
DEBUG = false

def foo () :
    DEBUG and debug ('some diagnostic info')
  • When DEBUG is false, my debug statements never evaluate args and are never called thanks to boolean short-circuiting
  • I can turn debugging on/off for a single function or context by defining DEBUG = true locally

Profiling results

Sat Jan 25 15 _46 _53 2025    prof/cjson.1

         18661266 function calls  18295664 primitive calls  in 14.658 seconds

   Ordered by: cumulative time
   List reduced from 1109 to 1038 due to restriction <'^  ?!_bootstrap . *$'>
   List reduced from 1038 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename _lineno function 
    108/1    0.002    0.000   14.658   14.658 builtin : exec}
        1    0.001    0.001   14.658   14.658 test.py _2 <module> 
        1    0.000    0.000   14.369   14.369 pypi : compact_json/formatter.py _315 serialize 
 318820/1    0.412    0.000   14.369   14.369 pypi : compact_json/formatter.py _335 format_element 
  45373/1    0.783    0.000   14.365   14.365 pypi : compact_json/formatter.py _399 format_dict 
    45243    0.224    0.000    8.264    0.000 pypi : compact_json/formatter.py _842 format_table_dict_list 
    90351    0.757    0.000    4.148    0.000 pypi : compact_json/formatter.py _619 format_list_table_row 
    90744    0.353    0.000    4.040    0.000 pypi : compact_json/formatter.py _370 format_list 
    45245    0.341    0.000    3.331    0.000 pypi : compact_json/formatter.py _1030 get_list_stats 
   182696    1.202    0.000    2.825    0.000 pypi : compact_json/formatter.py _146 format_value 
   182703    0.119    0.000    2.569    0.000 pypi : compact_json/formatter.py _372 <lambda> 
   182750    1.396    0.000    2.538    0.000 pypi : compact_json/formatter.py _100 update 
  3191214    1.163    0.000    2.065    0.000 std lib : logging/__init__.py _1412 debug 
   182703    0.559    0.000    2.020    0.000 pypi : compact_json/formatter.py _347 format_simple 
   318819    0.239    0.000    1.291    0.000 std lib : json/__init__.py _183 dumps 
   318819    0.342    0.000    1.052    0.000 std lib : json/encoder.py _182 encode 
    90354    0.606    0.000    0.947    0.000 pypi : compact_json/formatter.py _454 format_list_inline 
  3191214    0.902    0.000    0.902    0.000 std lib : logging/__init__.py _1677 isEnabledFor 
   182703    0.580    0.000    0.580    0.000 std lib : json/encoder.py _204 iterencode 
    45243    0.333    0.000    0.571    0.000 pypi : compact_json/formatter.py _922 format_dict_expanded 

ed2050 avatar Jan 25 '25 15:01 ed2050