crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Optimize JSON parsing a bit

Open asterite opened this issue 11 months ago • 12 comments

I saw this thread, so...

Two things here:

  1. Read string/IO byte per byte instead of char per char
  2. When trying to parse a string in the IO case, try to use the peek buffer if there's one

I used this Ruby/Crystal file to generate a big JSON:

json = %({"foo": 1, "bar": 2, "hello": "this will have an escape in it\\\"oh well", "banana": true, "array": [1, 2, 3, "a string that is a bit long"], "hash": {"a": 1, "b": 2, "c": 3}})
file = "[" + ([json] * 100000).join(",") + "]"
File.write("json.json", file)

Here's the benchmark:

require "json"
require "benchmark"

file_io = File.open("json.json")
file_string = File.read("json.json")

Benchmark.ips do |x|
  x.report("JSON.parse (string)") do
    JSON.parse(file_string)
  end
  x.report("JSON.parse (IO)") do
    file_io.pos = 0
    JSON.parse(file_io)
  end
end

Before:

JSON.parse (string)   8.41  (118.97ms) (± 0.86%)  117MB/op        fastest
    JSON.parse (IO)   3.64  (274.72ms) (± 1.99%)  117MB/op   2.31× slower

After:

JSON.parse (string)  11.98  ( 83.50ms) (± 7.70%)  117MB/op        fastest
    JSON.parse (IO)   9.06  (110.41ms) (± 6.26%)  117MB/op   1.32× slower

I think the difference between before and after will be bigger if there are more strings to parse.

A note for the forum thread: I tried parsing that same big file with Ruby 3.1 and Ruby was (slightly) slower: 16 seconds in Crystal vs. 19 seconds in Ruby. This is on a Mac. So I don't know why it was slower in Crystal for OP (maybe in Linux it's different?)

Regarding memory: Crystal requires 117MB to load that entire data into memory. But in Ruby it's the same. So I'm not sure how memory can be further optimized...

Please review carefully! I think tests for when the peek buffer is incomplete or unavailable might not exist right now. Feel free to continue working on top of this PR (pushing commits to this branch or creating another PR from this code).

asterite avatar Mar 15 '24 22:03 asterite

I found another optimization that improved both speed of parsing and the memory it allocates.

This PR made parsing slower by always allocating a string to parse integers and floats from. It also made it consume more memory. I was concerned about it at that time exactly because of that, but I agreed that correctness is more important than performance. However, we can have both! For "small" integers (less than 19 digits, never floats) we can compute the int value and always know that we are doing it right.

Before the last commit:

JSON.parse (string)  10.76  ( 92.92ms) (± 8.00%)  125MB/op        fastest
    JSON.parse (IO)   7.51  (133.21ms) (± 5.99%)  125MB/op   1.43× slower

After the last commit:

JSON.parse (string)  12.09  ( 82.69ms) (± 1.28%)  105MB/op        fastest
    JSON.parse (IO)   8.06  (124.08ms) (± 2.37%)  105MB/op   1.50× slower

So a bit faster and 20MB less, so about 16% percent less in this case.

But the memory allocated here depends on the amount of numbers and how long they are. For example, using this file:

json = "[1234234, 2982374, 3982734, 49827344, 592834, 65825, 723498, 82348, 9239847324, 1082348, 1123498, 122348, 132348, 142347, 152348, 16234283]"
file = "[" + ([json] * 10000).join(",") + "]"
File.write("json.json", file)

running the benchmark, before this PR:

JSON.parse (string)  95.86  ( 10.43ms) (± 1.44%)  13.9MB/op        fastest
    JSON.parse (IO)  42.03  ( 23.79ms) (± 3.17%)  13.9MB/op   2.28× slower

after this PR:

JSON.parse (string) 205.35  (  4.87ms) (± 1.11%)  9.06MB/op        fastest
    JSON.parse (IO)  76.83  ( 13.02ms) (± 1.18%)  9.06MB/op   2.67× slower

so 5 MB less than before, from a total of 14MB, that's about 30% less memory!

asterite avatar Mar 16 '24 10:03 asterite

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/17

crysbot avatar Mar 16 '24 10:03 crysbot

I think one of the "peek" branches is wrong. I'll see if I can merge it. I don't have a test to reproduce it yet, just one scenario I ran into.

asterite avatar Mar 16 '24 17:03 asterite

A note for the forum thread: I tried parsing that same big file with Ruby 3.1 and Ruby was (slightly) slower: 16 seconds in Crystal vs. 19 seconds in Ruby. This is on a Mac. So I don't know why it was slower in Crystal for OP

Let's make the JSON data contain some long strings. They are Base64-encoded in my case, but that doesn't matter.

gen-json.rb:

json = %|{"a_base64":#{("a" * 5000).inspect},"b_base64":#{("a" * 10000).inspect}}|
json = "[" + ([json] * 2500).join(",") + "]"
File.write("json.json", json)

benchmark-pk.cr:

require "json"
require "benchmark"

file_io = File.open("json.json")
file_string = File.read("json.json")

Benchmark.bm do |x|
  x.report("JSON.parse (string)") do
    JSON.parse(file_string)
  end
end

benchmark-pk.rb:

require "json"
require "benchmark"

file_io = File.open("json.json")
str = File.read("json.json")

Benchmark.bm(19) do |x|
  x.report("JSON.parse(str)") do
    JSON.parse(str)
  end
end
$ crystal build --release benchmark-pk.cr
$ /usr/bin/time -l ./benchmark-pk
                          user     system      total        real
JSON.parse (string)   0.174718   0.005743   0.180461 (  0.180668)
        0,20 real         0,18 user         0,02 sys
            94814208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                5877  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  39  involuntary context switches
          1868924524  instructions retired
           591412605  cycles elapsed
            93389632  peak memory footprint

That's before your changes, on Apple aarch64/arm64.

$ /usr/bin/time -l ruby ./benchmark-pk.rb
                          user     system      total        real
JSON.parse(str)       0.063432   0.003553   0.066985 (  0.067092)
        0,12 real         0,10 user         0,01 sys
            90554368  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                5642  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  43  involuntary context switches
          1390183537  instructions retired
           367126927  cycles elapsed
            86574272  peak memory footprint

That's using Ruby 3.3.0. The Ruby version is faster by a factor of 0.181/0.067 = 2.70.

philipp-kempgen avatar Mar 17 '24 00:03 philipp-kempgen

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/20

crysbot avatar Mar 17 '24 00:03 crysbot

Starting with Ruby 3.3 you can enable YJIT at runtime. It would be nice to see the w/wo YJIT times for this.

I now put the following Ruby snippet at the top of Ruby files to automatically enable YJIT when using Ruby >=3.3.

# Enable YJIT if using CRuby >= 3.3"
RubyVM::YJIT.enable if RUBY_ENGINE == 'ruby' and RUBY_VERSION.to_f >= 3.3

jzakiya avatar Mar 17 '24 01:03 jzakiya

Starting with Ruby 3.3 you can enable YJIT at runtime. It would be nice to see the w/wo YJIT times for this.

I get the same times with or without YJIT.

philipp-kempgen avatar Mar 17 '24 01:03 philipp-kempgen

And just for the record, here's my benchmark in Ruby with Oj:

require "benchmark"
require "json"
require "oj"

file_io = File.open("json.json")
str = File.read("json.json")

Benchmark.bm(19) do |x|
  x.report("JSON.parse(str)") do
    JSON.parse(str)
  end
  x.report("Oj.load(str)") do
    Oj.load(str)
  end
end
                          user     system      total        real
JSON.parse(str)       0.063371   0.003690   0.067061 (  0.067182)
Oj.load(str)          0.028192   0.003248   0.031440 (  0.031523)

i.e. Oj is faster than the Crystal version by a factor of 0.181/0.032 = 5.66.

philipp-kempgen avatar Mar 17 '24 01:03 philipp-kempgen

There are more improvements to be made here. I'd like to send them one by one in small PRs. If I put them all together here chances of this being merged are very small.

asterite avatar Mar 17 '24 12:03 asterite

That said, I don't think we'll reach the level of optimization of Oj. The C code is pretty hand-crafted. We could try to do the same but that file has copyright...

asterite avatar Mar 18 '24 14:03 asterite

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/25

crysbot avatar Mar 21 '24 01:03 crysbot

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/stringpool-make-the-hash-key-part-of-the-public-api/6766/5

crysbot avatar Apr 17 '24 19:04 crysbot