jansson
jansson copied to clipboard
Add function for querying the deep byte size of a value
It would be convenient to have a function called something like json_memory_size()
or json_deep_bytes()
that computes the total memory in bytes consumed by a given value and all of its ancestors.
The implementation could be similar to json_deep_copy()
, except a size_t
is returned instead.
Applications are using jansson in a garbage collector, a limited undo history stack, cache, etc.
If approved, I can send a PR.
I wouldn't strongly object but I have to ask what is the use case? Approval really can't happen without seeing code, I would require full test coverage for any new features.
A couple of potential issues to look out for in an implementation:
- You'd need to have a check to protect against circular references causing an infinite loop. See dump.c:loop_check(). This is apparently a bug of
json_deep_copy
, ifroot->subobj->obj === root
it will enter an infinite loop. - It's possible for one object to contain multiple copies of the same object. In the following example JSON it's possible to construct so that
field1
andfield2
both point to a single object:
{
"field1": {"name": "something", "description": "etc"},
"field2": {"name": "something", "description": "etc"}
}
So in that case you'd have to count the space used by the main object, both keys and one sub-object.
- It would be impossible to know about overhead introduced by custom allocation routines. This would just need to mentioned in the documentation for the new function.
what is the use case
My own use case is to limit undo history by memory size, based on the memory size of all undo "actions", which may contain json_t
values. Other hypothetical applications are given by the first post.
Approval really can't happen without seeing code
I meant if an API and behavior is approved.
This is apparently a bug of json_deep_copy
If it was documented, then it wouldn't be a bug. :) IMO an infinite loop for cycles is normal behavior, since naive code written by the user to iterate the json_t
tree (e.g. to deserialize an object) would also have an infinite loop. I.e. for performance reasons, the library should not have safety nets, and the user should be responsible for not creating cycles.
It's possible for one object to contain multiple copies of the same object.
IMO it would be okay to double-count these copies as long as that behavior is documented as well. Since json_deep_copy
would result in two copies of the "field"
value in your example, json_deep_bytes
would correctly return the exact memory size of its deep-cloned value.
This would just need to mentioned in the documentation for the new function.
I suppose the language would be something like "Counts the total number of bytes requested to malloc_fn
calls for the allocation of the given value and its ancestors."
I was curious what jansson does for loop detection when serializing to a string, and found that it adds the addresses of arrays and objects to a "parents" hash table as it encounters them, and removes them from the hash table when it is finished with them (so they can be used multiple places in a tree if not recursively).
https://github.com/akheron/jansson/blob/v2.12/src/dump.c#L199-L206