jansson icon indicating copy to clipboard operation
jansson copied to clipboard

Add function for querying the deep byte size of a value

Open AndrewBelt opened this issue 5 years ago • 3 comments

It would be convenient to have a function called something like json_memory_size() or json_deep_bytes() that computes the total memory in bytes consumed by a given value and all of its ancestors.

The implementation could be similar to json_deep_copy(), except a size_t is returned instead.

Applications are using jansson in a garbage collector, a limited undo history stack, cache, etc.

If approved, I can send a PR.

AndrewBelt avatar Jan 25 '19 10:01 AndrewBelt

I wouldn't strongly object but I have to ask what is the use case? Approval really can't happen without seeing code, I would require full test coverage for any new features.

A couple of potential issues to look out for in an implementation:

  1. You'd need to have a check to protect against circular references causing an infinite loop. See dump.c:loop_check(). This is apparently a bug of json_deep_copy, if root->subobj->obj === root it will enter an infinite loop.
  2. It's possible for one object to contain multiple copies of the same object. In the following example JSON it's possible to construct so that field1 and field2 both point to a single object:
{
  "field1": {"name": "something", "description": "etc"},
  "field2": {"name": "something", "description": "etc"}
}

So in that case you'd have to count the space used by the main object, both keys and one sub-object.

  1. It would be impossible to know about overhead introduced by custom allocation routines. This would just need to mentioned in the documentation for the new function.

coreyfarrell avatar Jan 26 '19 15:01 coreyfarrell

what is the use case

My own use case is to limit undo history by memory size, based on the memory size of all undo "actions", which may contain json_t values. Other hypothetical applications are given by the first post.

Approval really can't happen without seeing code

I meant if an API and behavior is approved.

This is apparently a bug of json_deep_copy

If it was documented, then it wouldn't be a bug. :) IMO an infinite loop for cycles is normal behavior, since naive code written by the user to iterate the json_t tree (e.g. to deserialize an object) would also have an infinite loop. I.e. for performance reasons, the library should not have safety nets, and the user should be responsible for not creating cycles.

It's possible for one object to contain multiple copies of the same object.

IMO it would be okay to double-count these copies as long as that behavior is documented as well. Since json_deep_copy would result in two copies of the "field" value in your example, json_deep_bytes would correctly return the exact memory size of its deep-cloned value.

This would just need to mentioned in the documentation for the new function.

I suppose the language would be something like "Counts the total number of bytes requested to malloc_fn calls for the allocation of the given value and its ancestors."

AndrewBelt avatar Jan 26 '19 16:01 AndrewBelt

I was curious what jansson does for loop detection when serializing to a string, and found that it adds the addresses of arrays and objects to a "parents" hash table as it encounters them, and removes them from the hash table when it is finished with them (so they can be used multiple places in a tree if not recursively).

https://github.com/akheron/jansson/blob/v2.12/src/dump.c#L199-L206

ploxiln avatar Jan 26 '19 22:01 ploxiln