BSON.jl
BSON.jl copied to clipboard
improve performance
After benchmarking BSON to figure out if it's a good replacement for JSON, I noticed that its by far the slowest. This PR fixes the performance partly:
@time BSON.bson("test.bson_master", Dict(:a=>x)) # 3.2s
@time open(io-> JSON.print(io, x), "test.json", "w") # 1.6s
@time open(io-> JSON2.write(io, x), "test.json2", "w") # 1.1s
@time BSON.bson("test.bson_this_pr", Dict(:a=>x)) # 0.26s
@time open(io-> CBOR.encode(io, x), "test.cbor", "w") # 0.16
It's still slower than CBOR, which also isn't doing the fastest it could - so I guess there are still some other low hanging fruits in here ;)
This appears to give ~10% on my benchmarks, but it would be way higher for more numerical data I guess. Added it to my test branch, see #34
This doesn't seem safe in general? What if x goes out of scope before the unsafe_wrap Array.
I tried this on VGG19 from Metalhead and got
julia> @time Metalhead.weights("vgg19.bson")
0.662478 seconds (1.65 M allocations: 631.681 MiB, 15.95% gc time)
Dict{Symbol,Any} with 38 entries:
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x1f353f3d -- getindex at .\array.jl:730 [inlined]
getindex at .\subarray.jl:228 [inlined]
_show_nonempty at .\arrayshow.jl:386
in expression starting at no file:0
getindex at .\array.jl:730 [inlined]
getindex at .\subarray.jl:228 [inlined]
_show_nonempty at .\arrayshow.jl:386
#383 at .\arrayshow.jl:404
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2219
show_nd at .\arrayshow.jl:294
_show_nonempty at .\arrayshow.jl:403 [inlined]
show at .\arrayshow.jl:421
Guess it's unsafe indeed :P I was assuming, it's only used just before sending it to write, but it's a bit hard to tell where things go. Maybe it'd be better to implement a write that works with an reinterpret array efficiently.
Maybe it'd be better to implement a write that works with an reinterpret array efficiently.
Yeah, probably. https://github.com/MikeInnes/BSON.jl/pull/46 is at least a lot better than what we have now.
Yup, I was thinking to just submit something like #46, but then I got greedy :D
I'd say just merge #46 and leave this unmerged until we find something safer ;)
Interestingly, just defining
function reinterpret_(::Type{T}, x) where T
reinterpret(T, x)
end
I can load VGG19 pretty damn fast
julia> @time Metalhead.weights("vgg19.bson")
0.269944 seconds (1.97 k allocations: 548.196 MiB, 37.98% gc time)
Dict{Symbol,Any} with 38 entries:
but it crashes when testing BSON itself.
@KristofferC probably because reinterpret returns ReinterpretArray which wraps the original array and lazily casts its contents when accessed.