json
json copied to clipboard
Very high memory usage with `serde_json::Value`
Unfortunately, due to Value
being completely public I don't know how much can be done about this without breaking changes. However, a couple of times I've run into problems with exceptionally high memory usage when using a Value
.
I don't think there's a bug here, just that common uses seem to be much more memory intensive than similar code in dynamic languages, where this kind of data is already heavily optimised.
I think it comes from several factors:
-
Each
Value
is 32 bytes on a 64-bit system even though the majority ofValue
s will be the leaf nodes (numbers, strings, nulls, etc.) which don't need that much space. IfValue
were more highly optimized for leaf nodes, I think this could easily be halved. -
Maps are optimized for access time rather than space efficiency, and this is made worse because there are lots of "empty"
Value
slots, each of which is another 32 bytes. -
Strings are owned. When converting from a struct with
to_value
, object keys will all be known statically, and those strings will already be embedded in the program as static data, so using aCow<'static, str>
could dramatically reduce memory usage. -
Strings are exclusively owned. When deserializing into a
Value
it's likely that there will be lots of duplicate strings, but there is no possibility for them to be shared with the currentValue
representation.
I think a more space-efficient Value
type could be introduced. Keys could be stored as a pointer-sized union of &'static str, Arc<String>
using a tag in the low bits to differentiate. The deserializer could automatically intern strings as they are deserialized. Value
could be shrunk to 16 bytes, and store short strings inline. Map
s could use a simple Vec
representation for small numbers of elements to avoid any wasted space. The improved cache-coherency could also improve performance. All access to "compact values" should be done via methods to allow further optimisations in the future. There would also need to be a version of the json!()
macro that produced this compact type.
@dtolnay I started working on a crate to address these issues:
https://github.com/Diggsey/ijson https://docs.rs/ijson
It is functionally complete but needs a lot more testing, etc. to get to a point where I can recommend people actually use it. That said, it demonstrates that significant improvements are possible.
Is this something you'd be interested in bringing into serde-json
some time down the line?
This came to me as a bit of an unpleasant surprise when my AWS Lambdas started running out of memory. I was sizing them based what is being retrieved from the DB. For example, ElasticSearch returns a 8,683KB document, I deser it into Value and the next RAM reading gives me delta of 98,484KB of RAM use. That's more than 10x the original size.
@dtolnay , David, is this high memory consumption a necessary price to pay for speed?
Is 561ms using from_slice()
on an 8.6MB JSON string considered fast?
@rimutaka serde_json is much more efficient at deserializing into structs, compared to the Value
type, so if that is possible for your usecase, then that's the best option.
@Diggsey , thanks for the suggestion. Do you know if Value is more compact if I deser into a struct and then convert it into Value?
It would only be more compact if some fields are dropped as part of the deserialization into a struct (if say they are not required).
Memory allocation log for processing 10MB of JSON data:
- JSON as String of 10,812,199 bytes => +10,150KB allocated
- JSON converted into struct => +63,996KB allocated
- struct converted into Value => 188,162KB allocated
I can understand high memory consumption when JSON is converted into Value because the size of collections is not known, so more is allocated than needed to make it faster. When a struct is converted into Value the size of collections is known in advance. Why do we still get such large memory overhead? Is it inevitable or can be improved?