FlatBuffers.jl icon indicating copy to clipboard operation
FlatBuffers.jl copied to clipboard

An attempt to create flatc support, and using the Arrow.jl FlatBuffer submodule as the 'official' implementation

Open jonalm opened this issue 3 years ago • 11 comments

I'm opening this issue to discuss the prospect of transitioning into using the Arrow.jl FlatBuffer submodule as the 'official' julia FlatBuffer implementation. The main motivation for doing so is two fold. First, the current implementation isn't really 'flat' in the sense that you need to parse the buffer by mapping the data into julia structs, rather than querying the buffer directly, which is a major motivation for using FlatBuffers in the first place. Second, as the Arrow.jl implementation is based on the official go implementation of FlatBuffers, which makes it easy to implement flatc support by modifying existing code. Having flatc support means that buffer access functionality is code generated. This is needed to be granted "official supported language" status, and to avoid having to manually implement julia code to match any given FlatBuffer schema.

I have a WIP fork of the flatc code here, which generates Arrow.jl FlatBuffer compatible julia code for fbs schemas. Some major current limitations is that it doesn't handle union types, and that all the generated code is put in a single module scope (I haven't figured out how to deal with the schema namespace yet).

I also have a WIP fork of FlatBuffer.jl here, containing the Arrow.jl code, and some updated tests from the official Monster.fbs example schema. In particular, the code generation is made here, and the generated code example is here.

jonalm avatar Dec 19 '21 23:12 jonalm

@jonalm I've used your code--both flatc and FlatBuffer.jl--to update Flatgeobuf.jl.

Most things are smooth, but hit issues on definitions that are field: [double]. They generate the following code FlatBuffers.Array{Float64}(x, o) [^1], and on parsing I get:

ERROR: ArgumentError: unsafe_wrap: pointer 0x7fb952ec2d0c is not properly aligned to 8 bytes
Stacktrace:
 [1] #unsafe_wrap#89
   @ ./pointer.jl:89 [inlined]
 [2] unsafe_wrap
   @ ./pointer.jl:89 [inlined]
 [3] (FlatBuffers.Array{Float64})(t::FlatGeobuf.Gen.Geometry, off::UInt16)
   @ FlatBuffers ~/.julia/packages/FlatBuffers/3YdKu/src/table.jl:107
 [4] getproperty(x::FlatGeobuf.Gen.Geometry, field::Symbol)
   @ FlatGeobuf.Gen ~/code/FlatGeobuf.jl/src/schema/generated.jl:35
 [5] top-level scope
   @ REPL[72]:1

The new code seems a huge improvement in terms of performance, as I don't need to allocate thousands of (empty) structs anymore. 👍🏻

[^1]: https://github.com/evetion/FlatGeobuf.jl/blob/feat/new-flatbuffers/src/schema/generated.jl#L561 and https://github.com/evetion/FlatGeobuf.jl/blob/feat/new-flatbuffers/src/schema/generated.jl#L35

evetion avatar Apr 26 '22 14:04 evetion

Thanks for the bug report @evetion! I added a test to the branch, using an array of doubles, which currently fails with the same error message that you describe. May have a chance to look a bit more into it later this week.

jonalm avatar Apr 26 '22 21:04 jonalm

https://github.com/jonalm/FlatBuffers.jl/blob/39132c368416d019c7875de34a3308d85b6b1e77/test/runtests.jl#L102

jonalm avatar Apr 26 '22 21:04 jonalm

I've been digging a bit. It seems to be an issue with all arrays of native types, I've added tests for reading and writing arrays of int32 floats and doubles, see here https://github.com/jonalm/FlatBuffers.jl/blob/af5a137609912f3ca46ca7fc2773239835b4fa17/test/runtests.jl#L114. Next step would be to investigate if the issue stems from writing or reading the buffer, could perhaps store the modified 'moster' buffer used in tests and parse it in python?

I suspect that it is related to parsing, and that the issue is here somewhere https://github.com/jonalm/FlatBuffers.jl/blob/af5a137609912f3ca46ca7fc2773239835b4fa17/src/table.jl#L103. Note that this code has been copied from the Arrow.jl repo, and that particular function was last modified here https://github.com/apache/arrow-julia/pull/234 , I got that to work almost by accident so I would not be surprised if it contains some buggy behaviour. @quinnj do you have any input here? I believe that this issue would be relevant for Arrow.jl.

jonalm avatar Apr 27 '22 12:04 jonalm

FWIW I'm parsing "official" flat(geo)buffer files, so the issue will at least be in reading/parsing.

evetion avatar Apr 27 '22 13:04 evetion

Well, a complete hack like this for L106 makes it work.

# ptr = convert(Ptr{S}, pointer(bytes(t), a + 1))
# data = unsafe_wrap(Base.Array, ptr, vectorlen(t, off))
data = reinterpret(S, bytes(t)[a+1:a+sizeof(S)*vectorlen(t, off)])

In this case, one avoids the bytes(t) with a length of 604 and the offset a+1 of 85, which indeed doesn't align at all to 8 bytes...

evetion avatar Apr 27 '22 14:04 evetion

@evetion does that give correct numerical values for your official buffer files? I've updated the branch with your temporary solution, with tests. The test now run, but the fail in writing/reading the same data. Do you have access to some official buffers that are public? If so I can set up some more tests.

jonalm avatar Apr 27 '22 18:04 jonalm

They do. ~~This temporary solution indicates that the data and the found offset is correct, but that the bytes have been shifted and are now misaligned? Writing them will produce corrupt data as you've found.~~. edit: bytes are not shifted, just aligned to 4 bytes (and the 85 threw me off, but we do +1 indexing), so unsafe_warp won't work for types > 4 bytes such as doubles. Reinterpret is the way.

Tomorrow I'll check the original file and scan for the correct data to find the actual offset, to see how this compares to the internal buffer.

evetion avatar Apr 28 '22 07:04 evetion

@jonalm In your tests, you should replace prependoffset! by prepend! for the vectors. And if you then replace the ... === floats_[1] by ... === floats[1], you're golden.

evetion avatar Apr 28 '22 21:04 evetion

Thanks, @evetion, I'll update this when I get to it. Also I moved this fork to it's own repo https://github.com/jonalm/FlatBuffers2.jl , this will allow us to track issues independently (I expect more), and to compare it with the current FlatBuffer.jl implementation.

jonalm avatar Apr 29 '22 07:04 jonalm

@jonalm, I'm happy to do a complete overhaul of the package code here and make you an admin if that would end up cleaner. We can just hold off on tagging the new breaking major release until you think things are ready.

quinnj avatar May 04 '22 03:05 quinnj