simd-json icon indicating copy to clipboard operation
simd-json copied to clipboard

simd_json::from_reader (and from_slice) end in error when trying to deserialize to a serde_json::RawValue

Open bassmanitram opened this issue 1 year ago • 4 comments

The error is "invalid type: newtype struct, expected any valid JSON value".

The use case is demonstrated by this GIST. Changing the version of lambda_http to 0.8 makes the use case work.

The difference is that the AWS code is now trying to deserialize to a RawValue where it previously serialized to a serde::__private::de::Content. Obviously the new code is far cleaner, but it doesn't work with the simd deserializer.

Any help would be appreciated.

(Equivalent issue in AWS world - https://github.com/awslabs/aws-lambda-rust-runtime/issues/792)

bassmanitram avatar Jan 24 '24 18:01 bassmanitram

I'm currently on vacation so I won't have time to look deeply into it for a while but at a first glance I think RawValue is the issue. Unrelated to that this is going to be fairly slow so not much benefit from the decode is to be had.

The slowness comes from that the bytes first goes to a nested structure (RawValue) this is expensive, then is translated to a struct using the serde traits (expensive again).

If you were out for performance I'd say use to_tape, try to introspect the content for the type, then turn that in a sturct. That would be no overhead to directly translating it to a struct.

Licenser avatar Jan 24 '24 18:01 Licenser

Hey - have a good vacation. Yeah, I too spotted the slowness of the approach and am talking to the AWS guys about it - thanks for the tape tip!

I have narrowed the problem down to this basic code that fails (yes it is RawValue):

	let mut payload3 = "{}".to_string();
	let payload3 = unsafe {payload3.as_bytes_mut()};
	//let mut de = serde_json::Deserializer::from_slice(payload3);
	let mut de = simd_json::Deserializer::from_slice(payload3).unwrap();
	let raw_value: Box<RawValue> = Box::deserialize(&mut de).expect("Boxed RawValue");

As above, it fails. Uncomment serde_json and comment out simd_json and it works.

bassmanitram avatar Jan 25 '24 09:01 bassmanitram

With respect to speed - you'lld be surprised - well, I was! Ok, not using "tape" but String->RawValue->&str->type is by FAR the fastest algorithm in serde JSON - compared to AWS v0.8 which used serde::__private::de::Content::deserialize and an attempt I made to speed things up, which just used serde_json::Value, the RawValue route was by far the fastest. Trying to use the simd_json deserializer in the two of those three contexts that work also proved to be slower than serde_json ... back to the drawing board, then.

bassmanitram avatar Jan 25 '24 16:01 bassmanitram

Ok, I've got me a tape and it should be able to do what I want ... but I don't see how to deserialize that into the target struct - I'm hoping there is bridge to serde_json here ?

What it seems I need is to reconstruct a Deserializer from the tape. Since the Deserializer is simply a tape and an index, would it be possible to add a from_tape constructor function? Even it that's unsafe (i.e. I have to be certain the tape is valid before using it- which, in my case I would - 'cause I just got the tape from a deserializer)?

Any help will be appreciated.

bassmanitram avatar Jan 26 '24 17:01 bassmanitram