simd-json using this for small JSON's

trafficstars

Hi, I was benchmarking this against a very simple small JSON

{
    "id": "60a6965e5e47ef8456878326",
    "index": 0,
    "guid": "cfce331d-07f3-40d3-b3d9-0672f651c26d",
    "isActive": true,
    "picture": "http://placehold.it/32x32",
    "age": 22
  }

Now my use case is: parse a small JSON as fast as possible just ONCE.

the results for me were (1 parse): serde_json = 3 microseconds simd_json = 10 microseconds

I was wondering if its normal for serde_json to be faster in smaller JSON's or am I getting incorrect results?

May 20 '21 17:05 whyCPPgofast

Here is a very bad bench but the differences are big enough...

#![allow(warnings)]

use std::time::Instant;
use serde::Deserialize;
use serde_json;
use simd_json;

#[derive(Deserialize)]
struct Person {
    id: String,
    index: i32,
    guid: String,
    isActive: bool,
    picture: String,
    age: u32
}

fn main() {

    let json_bytes = br#"{
        "id": "60a6965e5e47ef8456878326",
        "index": 0,
        "guid": "cfce331d-07f3-40d3-b3d9-0672f651c26d",
        "isActive": true,
        "picture": "http://placehold.it/32x32",
        "age": 22
    }"#.to_vec();



    let mut json_bytes_1 = json_bytes.clone();
    let now_1 = Instant::now();
    for _ in 0..100 {
        let p: Person = serde_json::from_slice(&json_bytes_1).unwrap();
    }
    println!("serde {:?}", now_1.elapsed());



    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..100 {
        let p2: simd_json::OwnedValue = simd_json::to_owned_value(&mut json_bytes_2).unwrap();
    }
    println!("simd_json {:?}", now_2.elapsed());

}

[dependencies]
serde = { version = "*", features = ["derive"] }
serde_json = "*"
simd-json = { version = "*", features = ["allow-non-simd"]}

May 20 '21 20:05 whyCPPgofast

That's a bit complicated to answer - one of those "it depends" situations 😭

simd gets 'better' for medium and larger files, for extremely small ones it's quite bad (i.e. smaller then the registers) I don't think that's the case for you, but there is some overhead.

So first of all, for small data serde-json can absolutely be faster then simd-json!

That said there are a few things:

The biggest issue in the benchmark is that it's comparing struct deserialization DOM serialization. The DOM serialization is quite a bit slower. To make a fair comparison and one that makes sense for users, you have to either compare dom deserialization for both or struct deserialization for both.

For benchmarks like that it usually is good to use a benchmark library as the compiler sometimes optimizes things away when it notices it isn't used. For example, the black_box function in criterion is one of those ways. (not sure if that applies here but for a good measurement it's a nice tool)

The third thing that will make a difference is using simd-json-derive for the deserialization via simd-json instead of the serde compatibility. Serdes deserialisation logic is slower since it has to be more generic and does a darn good job at that, with the simd-json-derive it is possible to optimize for exactly one format which gets quite a big faster. It is as simple as:

#[derive(Deserialize)] -> #[derive(Deserialize, simd_json_derive::Deserialize)] let p2: simd_json::OwnedValue = simd_json::to_owned_value(&mut json_bytes_2).unwrap(); -> let p2= Person::from_slice(&mut json_bytes_2).unwrap();

Next, and this depends a bit on your use-case, is you can optimize this by pre-allocating and re-using buffers. If your program starts, reads a small JSON, and closes again it won't help but if it is long-running this might do you good:

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    let mut string_buffer = Vec::with_capacity(2048);
    let mut input_buffer = simd_json::AlignedBuf::with_capacity(1024);
    for _ in 0..100 {
        let p2= Person::from_slice_with_buffers(&mut json_bytes_2, &mut input_buffer, &mut string_buffer).unwrap();
    }

Last but not least, and again this depends on your use case, you could avoid allocating strings as simd-json is quite good at borrowing when deserialization structs (this works with serde too I think so I'll add the serde related code in this example after all got to compare apples and apples :) !):

struct Person<'ser> {
    #[serde(borrow)]
    id: &'ser str,
    index: i32,
    #[serde(borrow)]
    guid: &'ser str,
    isActive: bool,
    #[serde(borrow)]
    picture: &'ser str,
    age: u32
}

May 20 '21 22:05 Licenser

Also I noticed you're using allow-non-simd which will always be slower then serde as it disables all the simd optimisations

May 20 '21 22:05 Licenser

So I updated your benchmark a bit:

#![allow(warnings)]

use std::time::Instant;
use serde::Deserialize;
use simd_json_derive::Deserialize as SimdDeserialize;
use serde_json;
use simd_json;

#[derive(Deserialize, SimdDeserialize)]
struct Person {
    id: String,
    index: i32,
    guid: String,
    isActive: bool,
    picture: String,
    age: u32
}

#[derive(Deserialize, SimdDeserialize)]
struct PersonBorrowed<'ser> {
    #[serde(borrow)]
    id: &'ser str,
    index: i32,
    #[serde(borrow)]
    guid: &'ser str,
    isActive: bool,
    #[serde(borrow)]
    picture: &'ser str,
    age: u32
}

const N: usize = 100000;

fn main() {

    let json_bytes = br#"{
        "id": "60a6965e5e47ef8456878326",
        "index": 0,
        "guid": "cfce331d-07f3-40d3-b3d9-0672f651c26d",
        "isActive": true,
        "picture": "http://placehold.it/32x32",
        "age": 22
    }"#.to_vec();

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2: simd_json::OwnedValue = simd_json::to_owned_value(&mut json_bytes_2).unwrap();
    }
    println!("simd_json {:?}", now_2.elapsed());

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2: Person = simd_json::serde::from_slice(&mut json_bytes_2).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (struct) {:?}", now_2.elapsed());

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2 = Person::from_slice(&mut json_bytes_2).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (simd-struct) {:?}", now_2.elapsed());

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2 = PersonBorrowed::from_slice(&mut json_bytes_2).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (simd-struct borrowed) {:?}", now_2.elapsed());


    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    let mut string_buffer = Vec::with_capacity(2048);
    let mut input_buffer = simd_json::AlignedBuf::with_capacity(1024);
    for _ in 0..N {
        let p2 = PersonBorrowed::from_slice_with_buffers(&mut json_bytes_2, &mut input_buffer, &mut string_buffer).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (simd-struct borrowed buffered) {:?}", now_2.elapsed());


    let mut json_bytes_1 = json_bytes.clone();
    let now_1 = Instant::now();
    for _ in 0..N {
        let p: Person = serde_json::from_slice(&json_bytes_1).unwrap();
        criterion::black_box(p);
    }
    println!("serde {:?}", now_1.elapsed());

    let mut json_bytes_1 = json_bytes.clone();
    let now_1 = Instant::now();
    for _ in 0..N {
        let p: PersonBorrowed = serde_json::from_slice(&json_bytes_1).unwrap();
        criterion::black_box(p);
    }
    println!("serde (borrowed) {:?}", now_1.elapsed());

}

[package]
name = "simd-bench-why"
version = "0.1.0"
authors = ["Heinz N. Gies <[email protected]>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde = { version = "*", features = ["derive"] }
serde_json = "*"
simd-json = { version = "*" }
simd-json-derive = "*"
criterion = "*"

I would recommend running that to look at your local system but here are the results I get on a laptop so variance is quite high bit serde is constantly faster:

simd_json 283.399304ms
simd_json (struct) 169.342152ms
simd_json (simd-struct) 168.756464ms
simd_json (simd-struct borrowed) 134.981265ms
simd_json (simd-struct borrowed buffered) 107.723584ms
serde 80.380321ms
serde (borrowed) 42.684127ms

May 20 '21 22:05 Licenser

wow, thanks for the detailed response!

I ran your updated benchmark:

simd_json 102.0061ms
simd_json (struct) 62.7426ms
simd_json (simd-struct) 61.5463ms
simd_json (simd-struct borrowed) 48.9708ms
simd_json (simd-struct borrowed buffered) 40.3499ms
serde 42.1693ms
serde (borrowed) 27.4827ms

I actually had no idea you could use str slices for string fields with serde that could definitely speed up my program. Thanks!

May 21 '21 09:05 whyCPPgofast

:+1: so the bottom line looks like this is a case where serde is faster :) just for giggles, I'd recommend giving it a spin in the app, switching between simd / serde on a feature flag is fairly simple given that they both have derive mechanics. I won't expect this to change but still curious :D

also if you're looking at processing newline delimited JSON, #194 might be something for you to keep an eye out for, if we get to implementing that the negative effects of small JSON for newline delimited readers will be negated.

May 21 '21 11:05 Licenser

on modern cpus simd seems faster using structs and borrowed

  Model name:            Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
    CPU family:          6
    Model:               165
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            2
    CPU(s) scaling MHz:  78%
    CPU max MHz:         5000.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5202.65
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
                         syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
                          pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt ts
                         c_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp
                          ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx
                          rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_windo

with N= 10M instead of 100k

simd_json 4.543799873s
simd_json (struct) 3.315843013s
simd_json (simd-struct) 3.113119353s
simd_json (simd-struct borrowed) 2.739245207s
simd_json (simd-struct borrowed buffered) 2.505824834s
serde 3.59520201s
serde (borrowed) 3.151620592s

Dec 12 '22 18:12 romanstingler

That's a cool insight thank you!

Dec 13 '22 03:12 Licenser

There have been a number of updates to simd-json's performance with 0.13 simd-json is now significantly faster when taking full advantage of it:

simd_json 72.086096ms
simd_json (struct) 60.104834ms
simd_json (simd-struct) 58.718949ms
simd_json (simd-struct borrowed) 53.066635ms
simd_json (simd-struct borrowed buffered) 23.85362ms
serde 41.25703ms
serde (borrowed) 32.312223ms

I'll close this for now

Oct 29 '23 12:10 Licenser

simd-json simd-json copied to clipboard

using this for small JSON's

simd-json
simd-json copied to clipboard