fastjson
fastjson copied to clipboard
Issues regarding the new Arena API
Some parts of the new Arena API that I find hard to use:
- Object only provides O(n) implementation of
Get/Set, which is useful in most cases, but it also makes it impossible to create very large objects. For fasterSetit might make sense to add another method calledAppendwhich skips checking for equal keys and append tokvsdirectly. ForGetI see two options: create amap[string]*Valueon firstGetand use the map on subsequentGetcalls; or add aToMapconvenience function that converts theObjectto amap[string]*Valuefor faster gets. - The only way to create an Array is to call
Arena.NewArray(), and callSetArrayItemfor each index. This requires the user to maintain array length by himself, and not be able to use the more idiomaticappend(). The API also prevents shrinking arrays or taking a slice from another[]*Valuedirectly. Consider adding a[]*Valueparameter toNewArray(breaking API change), or adding aReplacemethod that replaces everything in the array. - Commit https://github.com/valyala/fastjson/commit/c188f766b4b45af57d230cc9d0291130a8ae796b might cause memory leaks. For example, when a very large array is reused as an object, the array is kept in memory. In fact, this was the behavior before the commit as well. Consider emptying out everything, including slice pointers.
BTW, might not be related to this, but I don't think optimizations like https://github.com/valyala/fastjson/commit/fb293dfd043e3f46ebab89b83b157c2ee376728c is doing anything since there is only one memory read anyway, and CPU is clever enough to execute them in parallel.
Agreed on all the remarks.
Commit c188f76 might cause memory leaks. For example, when a very large array is reused as an object, the array is kept in memory. In fact, this was the behavior before the commit as well. Consider emptying out everything, including slice pointers.
I'm aware of the possible memory leak, but couldn't find good solution yet. Straightforward zeroing of all the pointers on Reset slows down the Parser on on JSONs with many items ( >10K ). Probably a separate ReleaseRefs method should be added to Parser for zeroing all the pointers.
BTW, any objects' pooling has possible memory leak. For instance, if the Parser was used for parsing a huge JSON and then re-used for parsing small JSONs, then it will waste memory allocated during huge JSON parsing. Distinct ParserPools may be used for parsing variously-sized JSON types in order to minimize the possibility of such a memory leak.
I don't think optimizations like fb293df is doing anything since there is only one memory read anyway, and CPU is clever enough to execute them in parallel.
Mostly agree, but CPU cannot execute these checks in parallel due to Go constraints - it guarantees that conditions are executed sequentially. Otherwise something like the following would break:
type A struct { n int }
func foo(a *A) bool {
// This line would break on a=nil if conditions are executed in parallel
return a != nil && a.n > 0
}
I'm aware of the possible memory leak, but couldn't find good solution yet. Straightforward zeroing of all the pointers on Reset slows down the Parser on on JSONs with many items ( >10K ). Probably a separate ReleaseRefs method should be added to Parser for zeroing all the pointers.It
I'm afraid that the current Parser benchmarks are actually flawed, which makes them faster than what it would be in practice. The same Parser is used for parsing the exact same json, which makes it much more likely to reuse arrays than usual. Consider the following json:
{"a":"hello", "b": <a large array>}
When the same parser is used for:
{"b": <a large array>, "a": "hello"}
The array could not be reused, causing memory leaks and slowdowns.
The reorder might frequently happen in practice (especially with Go's behavior of randomizing map keys) but that's not covered in the benchmarks. You should remove benchPool in the benchmarks since they are deceptive.
With my humble algorithmic knowledge, I did find a solution though, involving only logarithmic allocations, but it's tedious to implement and few other JSON parsers use such techniques. You create parser-wide kvPool and item slices to store all key-value pairs and array elements in the JSON. You maintain two temporary stacks during parsing, called kvStack and itemStack, for objects and arrays respectively. When parsing an object, the key-value pairs are pushed to kvStack, and after } is seen, pop the items from kvStack and append them to the parser-wide kvPool, then take the corresponding slice from kvPool as the object's kv. A similar technique applies to arrays as well. Note that this prevents appending to kv or item in place, and you need to create another array to do that.
This sounds interesting... I need some time to evaluate the proposed approach in the PR.