aeson icon indicating copy to clipboard operation
aeson copied to clipboard

How to take advantage of good input?

Open ethercrow opened this issue 3 years ago • 3 comments

Suppose you have some guarantees about input json (because you produced it yourself or already checked it somehow):

  • Only ASCII
  • No control characters inside strings
  • Whitespace is only spaces or newlines (no tabs or more exotic things)
  • No escaped sequences

It would be good to be able to tell aeson to use this knowledge and parse significantly faster.

ethercrow avatar Jan 12 '22 09:01 ethercrow

I doubt there will be significant speedup. E.g. text unescaping is essentially that fast: you can be only faster by directly memcpyng without any validation. I'm not excited about that, but that is an option (not validating text at all can cause very weird errors when construction object maps, so post-validation isn't a good option).

the only non string literal thing you propose is whitespace, I don't think that considering just two vs four (space, \t \r \n) characters would matter. even no whitespace doesn't speed up, as json parsing is LL(1) anyway.

Feel free to make a PoC and convince me there would be significant difference.

phadej avatar Jan 12 '22 13:01 phadej

POC: https://github.com/ethercrow/aeson/commit/3c43bcdd2c7ef16172a030a591c4b54be739c349

>> ./bench.sh compare cabal-plan-master cabal-plan-fast-and-loose
INFO: Comparing runs: cabal-plan-master cabal-plan-fast-and-loose
RUN: criterion-cmp .bench-results/cabal-plan-master.csv .bench-results/cabal-plan-fast-and-loose.csv
Benchmark                          cabal-plan-master  cabal-plan-fast-and-loose
Examples/decode/cabal-plan/lazy    1.054e-3           0.898e-3 -14.84%
Examples/decode/cabal-plan/strict  1.075e-3           0.890e-3 -17.16%
Geometric mean                     1.065e-3           0.894e-3 -16.01%

ethercrow avatar Jan 12 '22 14:01 ethercrow

Ok. Good to know. I wonder if the benefit is multiplicative if parser is rewritten, like in #768.

My plan (which I should write down and publish, in light of transparency) is to do something like #768. Then we can try this idea again.

phadej avatar Jan 12 '22 15:01 phadej