FlatBuffers.jl
FlatBuffers.jl copied to clipboard
Building Flatbuffer bytes yields a corrupted array
When building FlatBuffer bytes, sometimes the array ends up in either of the following states:
- just showing the array throws segfault in getindex
signal (11): Segmentation fault: 11
in expression starting at no file:0
getindex at ./array.jl:732 [inlined]
alignment at ./arrayshow.jl:68
unknown function (ip: 0x12063304e)
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1829
print_matrix at ./arrayshow.jl:186
print_matrix at ./arrayshow.jl:159 [inlined]
print_array at ./arrayshow.jl:308 [inlined]
show at ./arrayshow.jl:345
...
- showing the array or accessing indexes at the end of the array yields
76287316-element Array{UInt8,1}:
Error showing value of type Array{UInt8,1}:
ERROR: ReadOnlyMemoryError()
Stacktrace:
[1] getindex at ./array.jl:732 [inlined]
[2] alignment(::IOContext{REPL.Terminals.TTYTerminal}, ::Array{UInt8,1}, ::Array{Int64,1}, ::UnitRange{Int64}, ::Int64, ::Int64, ::Int64) at ./arrayshow.jl:68
[3] print_matrix(::IOContext{REPL.Terminals.TTYTerminal}, ::Array{UInt8,1}, ::String, ::String, ::String, ::String, ::String, ::String, ::Int64, ::Int64) at ./arrayshow.jl:186
[4] print_matrix at ./arrayshow.jl:159 [inlined]
[5] print_array at ./arrayshow.jl:308 [inlined]
[6] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::Array{UInt8,1}) at ./arrayshow.jl:345
...
- no exception is thrown and the array can be read back
I have originally suspected segfault in CodecZlib
with libz
, however it was only the first thing that has touched the array after building the FlatBuffer bytes and therefore the getindex was called there.
The definition of the FlatBuffers, that I use, is following (I have renamed entries but types and structure are the same)
mutable struct Content
name::String
os::Vector{String}
types::Vector{UInt8}
ss::Vector{String}
end
mutable struct MyFB
id::String
s_id::Vector{Int64}
b_id::Vector{Int64}
c::Vector{Content}
end
mutable struct MyFBGrouped
group::Vector{MyFB}
end
In my code I extract data to create MyFB
and after that I group them into the MyFBGrouped
to reduce the number of files that are created.
I have been seeing a few crashes in building MyFB
, however I could not isolate the issue to one specific file, because it can just work on the second try and running it through 1M files multiple times is expensive.
I was able to successfully reproduce the behavior with the grouping into MyFBGrouped
, where I found 100 MyFB
s, for which the segfault is triggered in almost 100% of cases (stacktraces above, were generated by it). Unfortunately I am not able to share them, however I am eager to help debug this issue as it is a really high priority for me.
Julia v0.7.0
FlatBuffers v0.4.0
Could you try on the flatc branch and let me know if the issue is still present?
I've just tried it out on the flatc branch and it seems to resolve the problem. I will do some further tests once I get to work tomorrow.
After some test, I have two notes about the branch and the issue:
- Firstly, I have unfortunately found another segfault case, around the same size as last time. The interesting thing I have noticed that if I group
files[1:end-1]
instead offiles[1:end]
, where files are theMyFB
FlatBuffers, it does not crash. This lead me to trying just the last twofiles[end-2:end]
, which however does not crash, so it somehow needs the preceding bytes to crash... The same behavior can be reproduced on the last tagged version v0.4.0, where in the first case I have to take 4 files from the back of the array (files[1:end-5]
) just to get it working and only one file from the second case. Note that in both of these cases the resulting size of the FlatBuffer bytes drops under 75MB after the last removal, however that might be just a coincidence. - Secondly, there is a performance regression in the flatc branch, where reading of
MyFB
takes 6x longer than on v0.4.0 and building takes 10x longer. This is mainly caused by the increased number of allocations and should be hopefully reproducible with a general test (personally have not yet tested on some dummy data). The table has been generated using one sample of about 2MB in size and using the@btime
macro.
load one MyFB build one MyFB
flatc branch 0.6s (4.3M allocations) 0.7s (6.3M allocations)
v0.4.0 0.09 (1.9M allocations) 0.07s (1M allocations)
Hey, thanks for the detailed analysis. It's a good point, and we should probably have some performance tests. The performance issue should be easy to reproduce. As for the segfault issue, give me some time to reproduce on my end and I'll see if I can get to the bottom of it.
I found and fixed an issue which may be the cause of lots of unnecessary allocations - if it's easy for you to run your performance test again on that branch could you do so and post the results here? Thanks!
Thanks for the effort, unfortunately there is still a lot of allocations. I have made a trace of the allocations for v0.4.0
and for the new flatc
branch, separately for both read and build. Hope you can make something out of it.
Also there is one thing bugging me. Every time I pull the flatc branch and run it on 0.7.0, there is an error with the type
keyword in src/macros.jl. I forked it and made some changes, though maybe the change from type
to typ
might be more desirable, than the one that I made.
Fair! I haven't tested compatibility with julia 0.7. I'll be sure to make sure it works before merging this pull request.
This thread is turning into a bit of a discussion about PR #33 rather than discussion of the issue itself. I ran some of my own profiling this time and I think I managed to improve the performance a bit more. However, since this change adds a lot of new functionality (vectors of unions, support for deprecated fields, flatc) some regression in performance is inevitable. The question of how much regression is tolerable is subjective, but if it makes the package completely unusable for you then I’d say that’s unacceptable. So the question is now: would you be happy if the package were released from flatc now?
The follow up question is: as it stands, I can’t reproduce this issue. I tried replicating the types given and populating them with some stuff. Can you provide a minimal reproduction example? If this is not possible, can you give approximate numbers/sizes of each of the fields of the given types?
I am sorry for the late response, I spent last few weeks working around it using Julia built in serialization, however that was also quite slow to my taste. The issue there was that the structure is probably too nested for the serialization to work faster than the FlatBuffer on either of the branches in question. As for the performance regression I would probably like to see how well the package performs against implementation in other languages, for I understand the urge to bring it closer to the flatc
functionality and this might be a good measure to weigh in on the decision.
As for the issue at hand, while benchmarking the different methods I managed to create a reproducible example of a crash, however this time during the reading of the FlatBuffer bytes. Here is the sample script, that populates an array with the same MyFB
, builds the FlatBuffer bytes, and then reads them.
using FlatBuffers
using Random
## definition of FlatBuffers from the first post here
mutable struct Content
name::String
os::Vector{String}
types::Vector{UInt8}
ss::Vector{String}
end
mutable struct MyFB
id::String
s_id::Vector{Int64}
b_id::Vector{Int64}
c::Vector{Content}
end
mutable struct MyFBGrouped
group::Vector{MyFB}
end
function createMyFB(n)
function randContent()
name = randstring(rand(2:7))
m = rand(1:3)
os = [randstring(rand(3:20)) for i in 1:m]
types = UInt8.(rand(0:10, m))
ss = [randstring(rand(4:20)) for i in 1:rand(0:2)]
Content(name, os, types, ss)
end
id = randstring(10)
s_id = rand(1:n, n)
b_id = rand(1:n, n)
c = [randContent() for i in 1:n]
MyFB(id, s_id, b_id, c)
end
myfb = createMyFB(10000);
group = MyFBGrouped(fill(myfb, 90));
bytes = FlatBuffers.bytes(FlatBuffers.build!(group));
length(bytes) # the bigger the file the easier it is to crash
readBack = FlatBuffers.read(MyFBGrouped, bytes);
This crashes Julia every time, both on v0.4.0
and flatc
, if not, just repeat the read twice or more times until it does. The crash is one of the least informant I've ever seen
signal (11): Segmentation fault: 11
in expression starting at no file:0
_platform_memmove$VARIANT$Haswell at /usr/lib/system/libsystem_platform.dylib (unknown line)
Allocations: 70860507 (Pool: 70858006; Big: 2501); GC: 161
Additionally sometimes on the flatc
branch just before the segfault the reading function starts to return the following error
ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type String
Closest candidates are:
convert(::Type{String}, ::Symbol) at deprecated.jl:53
convert(::Type{String}, ::Array{UInt8,1}) at deprecated.jl:53
convert(::Type{String}, ::AbstractArray{Char,1}) at deprecated.jl:53
...
Stacktrace:
[1] Content(::Nothing, ::Nothing, ::Nothing, ::Nothing) at ./REPL[3]:2
[2] read(::FlatBuffers.Table{Content}, ::Type{Content}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:319
[3] getvalue(::FlatBuffers.Table{MyFB}, ::Int64, ::Type{Content}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:262
[4] getarray(::FlatBuffers.Table{MyFB}, ::Int64, ::Int32, ::Type{Content}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:229
[5] getvalue(::FlatBuffers.Table{MyFB}, ::Int16, ::Type{Array{Content,1}}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:239
[6] read(::FlatBuffers.Table{MyFB}, ::Type{MyFB}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:314
[7] getvalue(::FlatBuffers.Table{MyFBGrouped}, ::Int64, ::Type{MyFB}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:262
[8] getarray(::FlatBuffers.Table{MyFBGrouped}, ::Int64, ::Int32, ::Type{MyFB}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:229
[9] getvalue(::FlatBuffers.Table{MyFBGrouped}, ::Int16, ::Type{Array{MyFB,1}}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:239
[10] read(::FlatBuffers.Table{MyFBGrouped}, ::Type{MyFBGrouped}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:314
[11] read at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:294 [inlined]
[12] read(::Type{MyFBGrouped}, ::Array{UInt8,1}) at /Users/jfrancu/.julia/packages/FlatBuffers/KWFpP/src/FlatBuffers.jl:325
[13] top-level scope at none:0
Why I think this is relevant to the building issue is because if I dump those bytes into a file right after reading and try to deserialize them in another Julia session everything is okay. This leads me to believe, that even though the array access does not trigger segfault, the array still seems to be somehow corrupted after its creation, or there is some internal state that affects the deserialization.
sweet! thanks for the reproduction case. I was able to prevent it from crashing by doing this.
using FlatBuffers
using Random
# rjkat: import julia's garbage collector
import Base.GC
## definition of FlatBuffers from the first post here
mutable struct Content
name::String
os::Vector{String}
types::Vector{UInt8}
ss::Vector{String}
end
mutable struct MyFB
id::String
s_id::Vector{Int64}
b_id::Vector{Int64}
c::Vector{Content}
end
mutable struct MyFBGrouped
group::Vector{MyFB}
end
function createMyFB(n)
function randContent()
name = randstring(rand(2:7))
m = rand(1:3)
os = [randstring(rand(3:20)) for i in 1:m]
types = UInt8.(rand(0:10, m))
ss = [randstring(rand(4:20)) for i in 1:rand(0:2)]
Content(name, os, types, ss)
end
id = randstring(10)
s_id = rand(1:n, n)
b_id = rand(1:n, n)
c = [randContent() for i in 1:n]
MyFB(id, s_id, b_id, c)
end
myfb = createMyFB(10000);
group = MyFBGrouped(fill(myfb, 90));
bytes = FlatBuffers.bytes(FlatBuffers.build!(group));
length(bytes) # the bigger the file the easier it is to crash
# rjkat: temporarily disable garbage collection
GC.enable(false)
readBack = FlatBuffers.read(MyFBGrouped, bytes);
# rjkat: turn garbage collection back on again
GC.enable(true)
I suspect it is a bug in Julia's garbage collector, or the way we are interacting with it in FlatBuffers. For now use this as a workaround, I'll look at disabling garbage collection as part of the FlatBuffers read()
and deserialize()
functions in a later release.
Thanks for the quick response. It is quite a radical workaround disabling the GC, however I can confirm, that it also remedies the main issue, i.e. I am not able to reproduce the corrupted arrays with GC disabled for the whole script, so that's somewhat good news. Hope this can be fixed soon.