flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

Possible design for 64-bit sized buffer support in FlatBuffers

Open aardappel opened this issue 1 year ago • 28 comments

"2GB should be enough for anybody!" I said some 9 years ago when first coming up with FlatBuffers. Of course, even back then OSes were already 64-bit and machines had memory for bigger buffers than that, but because of the nature of FlatBuffers (largely immutable data written linearly) it seemed that if people would store >2GB of FlatBuffer data, that would be most efficient in multiple files (packets / buffers / ...) anyway. And of course 32-bit offset made the format a whole lot more compact.

Turns out, people nowadays have actual legit use-cases for storing more than 2GB in a single consecutive FlatBuffer:

  • A typical use case is some form of in-memory streaming of objects / records / logs coming in over the network, which get written into a FlatBuffer. While each record is typically <2GB, it is kind of artificial to have partition this log of them at 2GB boundaries, when many server machines could accumulate much more than that before running out of memory.
  • Some people have binary data coming from outside of FlatBuffers that they'd like to store inside of a FlatBuffer vector, and 2GB is sometimes restrictive there too.

Now, what would seem simple is to simply fork the entirety of FlatBuffers and have new files where every offset is now 64-bit. Given the amount of language implementations that we have, and the amount of code locations that would need touching to become 32/64 agnostic this would actually be a huuuge amount of work, and unrealistic to expect to be able to implement with volunteer maintainers. Also, a fork is not desirable, for many use case a buffer that may contain 64-bit sized items but is still largely compatible with the 32-bit format is much nicer.

So, here's a possible design that aims to enable the above use cases while causing the absolute least amount of churn for FlatBuffers implementations, and making support as easy as possible. This may come at some cost of generality, but since 64-bit buffers are still a bit niche I think this is the right tradeoff.


The idea is to only have 1 new "type variant": 64-bit sized scalar vectors.

Such a vector is designated by using the vector64 attribute that may only be used on table fields. That table field will be implemented using a 64-bit offset to a vector with a 64-bit size field.

The entire rest of FlatBuffers will stay 32-bit as before.

In terms of serialization, think of any buffer having a head and a tail. The head is the traditional FlatBuffer, the tail (what you serialize first) may ONLY consist of 64-bit vectors, thru FlatBufferBuilder::CreateVector64 or similar. The builder can easily keep track that the tail is only such vectors (and assert if not), and that all the rest of the data (the head) still fits in 2GB (so all traditional offsets work). The offsets created in the tail may only be stored in vector64 fields in the head.

Now you may wonder how this helps with the first use case, where each record may have a variety of data in it (nested tables/strings..). The idea is that these will be stored as a nested_flatbuffer. This solves several problems, first it means that this record is forced to be self-contained, meaning it doesn't share strings or anything with the main buffer, which would now also need to become 64-bit, complicating the design significantly. It also allows us to restrict the supported 64-bit types to just scalar vectors.

As an example:

table Root {
  my_records:[SubWrap];
  my_gigantic_buffer:[ubyte] (vector64);
  my_normal_vector:[float];
}

table SubWrap {
  rec:[ubyte] (nested_flatbuffer: Sub, vector64);
}

table Sub {
  // may contain absolutely anything.
}

So here, my_records indicates the first use case, and needs a table to wrap the vector. That doesn't seem ideal, but since we're typically referring to large vectors, this overhead should be minimal. my_gigantic_buffer is the 2nd use case, where we simply want to store a binary blob >2GB.

To serialize, you'd first create vectors for each of these [ubyte], followed by a normal buffer storing all the references to it. Fields like my_normal_vector would need their vector to be serialized after the 64-bit vectors.

In terms of forwards/backwards compatibility, we'd modify the schema parser such that initially occurrences of vector64 errors for any language backend that doesn't have support for it, much like we've done for new union related features in the past. Since using vector64 means adding a new field, we can thus never have the situation where an old implementation accidentally tries to read a 64-bit vector offset as 32-bit.

Then, language can incrementally add support for these new 64-bit vectors, which should be relatively straightforward, e.g. for C++ this CreateVector64 function does most of the work of tracking the use of these vectors, would return a new offset type incompatible with 32-bit offsets, and serialization functions for these new fields would only accept that offset type. Similarly, when reading, we will now have a Vector64 type separate from Vector for access.

A language that can't spend the time for a full implementation but wants to be compatible with schemas that use vector64 could simply add a quick check for the vector64 attribute and then omit accessors for that field, and would thus generate code that can still access the rest of that buffer (or simply buffers that don't happens to actually store any 64-bit vectors).

Related:

  • Original notes on 64-bitness: https://github.com/google/flatbuffers/projects/10
  • Discussion in "FlatBuffers2" binary format: https://github.com/google/flatbuffers/issues/5875

WDYT?

@rw @mikkelfj @vglavnyy @mzaks @mustiikhalil @dnfield @dbaileychess @lu-wang-g @stewartmiles @alexames @paulovap @AustinSchuh @maxburke @svenk177 @jean-airoldie @krojew @iceb0y @evanw @KageKirin @llchan @schoetbi @evolutional

aardappel avatar Sep 19 '22 20:09 aardappel

This would help with #7536, but I think it seems a little "special cased" for a language feature with changes to the schema language and everything. I think it's better to provide this as a library, we can provide some BigFlatbuffer<Root, Sub> type with its own builder, and reader classes. Doing this as a library would also decrease the maintenance and implementation burden.

CasperN avatar Sep 19 '22 20:09 CasperN

@CasperN I am not sure how you'd "do it as a library", given that this needs features in the schema parser, the code generators, the runtime code, etc. It be possible to do it in the runtime only but it be very unergonomic, and work differently in every language.

And yes that linked issue is pretty much my use case #1

aardappel avatar Sep 19 '22 20:09 aardappel

I am not sure how you'd "do it as a library"

So the binary format would be

[size prefixed table][length and vector of 64bit offsets][many tables]

The first size prefixed table on the left would correspond to Root (but without the my_gigantic_buffer field) while the many tables at the end correspont to Sub. This is almost the same as the binary format that you're proposing. The difference is that the big vector is not "inside" of the root table.

Without knowing the types, a BigFlatbufferReader can find the root table, read the vector of offsets, and find each sub table. Though we should parameterize it with Root and Sub to offer typed access to the root and sub tables. This does not require changes to code-gen or the schema language.

Making a BigFlatbufferWriter is tricker and would require modifications to flatbuffer builders, but similar modifications are required in the builders no matter how we choose to implement this.

CasperN avatar Sep 19 '22 21:09 CasperN

Basically I'm suggesting adding a new kind of implicit "root", there's

  • the normal root (offset to the first table)
  • size prefixed root
  • size prefixed root but with 64bit vector to more tables afterwards

CasperN avatar Sep 19 '22 22:09 CasperN

Which brings up the question, we use size prefixed flatbuffers quite a bit. How would those work with your proposal?

Since 2 GB means that the highest bit is available, maybe use that to signal that this is a 64bit flatbuffer inside the size prefixed flatbuffer?

AustinSchuh avatar Sep 19 '22 22:09 AustinSchuh

@CasperN vector of 64bit offsets with many tables afterwards was essentially my original suggestion in the linked "projects" page. But that design has many problems, in particular around avoiding sharing of vtables, and any nested/shared objects, which would generate more 64-bit offsets needed inside the many tables. It also makes it harder to force ordering in the buffer. To that end, I simplified the design to only support nested_flatbuffer which eliminates all those problems in one go.

I also don't understand the relevance of size prefixed.

@AustinSchuh yes, sadly, the buffer as a whole would need a new kind of "64-bit size prefixed".. it may be good to set the high bit, but users were already required to statically differentiate between size prefixed and not, so this new type is no different. And for the nested buffers there is no problem, since they were always sized anyway, and will now be automatically 64-bit sized.

aardappel avatar Sep 19 '22 22:09 aardappel

I agree that the many tables should be independent (no sharing between elements of the big vector) to avoid those problems. The relevance of size-prefixed is that it tells you when the “root” flatbuffer ends and when the table of offsets begin. The “size” refers to the size of the table of type Root. The byte after that is the start of the vector of offsets.

The only difference between my suggestion and the original post is the observation that if the big vector is “top level” and not inside of the table, Root, then this can be implemented with even less effort.

Actually the simpler version is to not distinguish between the root table and the sub tables.  Consider the following format:

• the first 8 bytes is the uint64 length in bytes of everything • Second 8 bytes are the unit64 number of elements in the vector • Then the uint64 offsets to the nested flatbuffers • Then the nested flatbuffers

Applications can choose to make the 0th element a special “Root” table and all other elements a “Sub” table On Sep 19, 2022, 18:49 -0400, Wouter van Oortmerssen @.***>, wrote:

@CasperN vector of 64bit offsets with many tables afterwards was essentially my original suggestion in the linked "projects" page. But that design has many problems, in particular around avoiding sharing of vtables, and any nested/shared objects, which would generate more 64-bit offsets needed inside the many tables. It also makes it harder to force ordering in the buffer. To that end, I simplified the design to only support nested_flatbuffer which eliminates all those problems in one go. I also don't understand the relevance of size prefixed. @AustinSchuh yes, sadly, the buffer as a whole would need a new kind of "64-bit size prefixed".. it may be good to set the high bit, but users were already required to statically differentiate between size prefixed and not, so this new type is no different. And for the nested buffers there is no problem, since they were always sized anyway, and will now be automatically 64-bit sized. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

CasperN avatar Sep 19 '22 23:09 CasperN

We needed >2gb support and ended up doing something very similar. Our format was: [4 byte file identifier] [number of offsets] [vector of 64 bit offsets] [regular flatbuffer]

The regular flatbuffer had normal int fields with an index into the offset table.

We used the file identifier to detect if this was a plain flatbuffer or our custom format.

BrianHarris avatar Sep 20 '22 00:09 BrianHarris

@CasperN yes, but this makes it into a very "custom" format.. my original design tries to keep things as similar to existing FlatBuffers as possible, and gives some flexibility as to where references to these big objects/vectors are stored (doesn't have to be a in a single top level vector). That little flexibility comes at almost no cost.

@BrianHarris cool, yes, that is essentially a container format (PAK file? :) around smaller FlatBuffers. It differs from what I am proposing that I want to use regular FlatBuffers for the "directory" part (and also allow larger than 2GB constituent parts).

aardappel avatar Sep 20 '22 17:09 aardappel

but this makes it into a very "custom" format.. my original design tries to keep things as similar to existing FlatBuffers as possible, and gives some flexibility as to where references to these big objects/vectors are stored (doesn't have to be a in a single top level vector). That little flexibility comes at almost no cost.

I actually disagree whether the cost/flexibility trade off is worth it. I think the only flexibility benefit is the ability to access the big vector via the root table (or one of the tables you can access via the root table). It's also nice that there isn't a proliferation of "root access types" (we'd have get_root<T>, get_size_prefixed_root<T>, and now get_big_vector<T>). However those ergonomic benefits are imo not worth the cost of multi-language code generation because it's equivalent to @BrianHarris's solution with a little sugar to how the big vector is accessed.

I think the cost would be worth it if this feature comes with a roadmap for more 64bit flatbuffers features: e.g. any table can have a 64bit offset to a big vector, nested flatbuffer, string, or a big string. In that context, I can absolutely see why the codegen must be aware of 64bit offsets and we'd want a uniform design for the different use cases. However, we're not going for such a large project.

CasperN avatar Sep 20 '22 18:09 CasperN

I agree with Casper that this can, and probably should, be done as a library/documentation/sample code. At least until "FlatBuffers2"

One of the problems with the approach I took is writing out the data requires knowing about all the data blobs ahead of time, so we can pre-allocate the vector at the beginning of the buffer. It also requires us going back and filling in the offsets once we know the final size of the flatbuffer. This wasn't a problem for us because we never streamed this data out while writing it, so seeking on disk wasn't a concern. A nice feature of flatbuffers though is it's written depth-first, so you never have to seek backwards when writing, nor do you have to scan your source data to pre-calculate anything. Moving the vector to the end of the buffer would solve this problem though: [flatbuffer] [large data] [vector of 64 bit offsets] [number of offsets]

You could find the root table by looking at the first offset (or if "number of offsets" is 0 then just looking immediately before it).

BrianHarris avatar Sep 20 '22 18:09 BrianHarris

Just chiming in - I'd love to see this implemented. At 2 different companies now I've seen custom hacks added specifically to support >4gb binary blobs contained within flatbuffers streams. One just wrote a metadata header that contained the size of the blob & wrote it directly to the stream after the flatbuffers message, leaving it to the streaming code to do the right thing. The other forked flatbuffers and made it possible to enable 64bit offsets on a per-schema basis, which is wasteful, but functional.

yeswalrus avatar Oct 14 '22 19:10 yeswalrus

I think 64 bit should just be the same format compiled with uoffset_t and soffset_t as 64 bit offsets. It relatively trivial to do, offers complete freedom for the end user, and the file format would at most be twice as much as you need to, which probably is of less concern at those scales (although it will affect cache locality and thus performance).

Starting to add custom partially large data structures quickly gets complex both for the library and the end user. I have looked into a number of ways of doing it and never really found anything satisfying. It is similar to the jump from 8/16 bit memory banks, then x86 segmented 16 bit memory then 32-bit memory, then 64 bit memory. We never had segmented 32 bit memory but jumped straight to 64 bits (albeit CPU's don't actually address the full 64-bit range on the internal bus).

Should we add an option for 64 bit, there would be need to be a format identifier bit and I would not expect the same generated code to work with both 32- and 64-bit formats. In fact, flatcc has prepared namespaces to generate code for different formats.

I also suggest looking into StreamBuffers at the same time so at least readers are guaranteed to be able to read front to back generated buffers. This is a bigger change, but more important for larger buffers. It means you can write and transport buffers without keeping all in memory.

https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#streambuffers

mikkelfj avatar Oct 15 '22 08:10 mikkelfj

Also, I can add that I do work with a large format read only database in FlatBuffers where the index is generated in a separate FlatBuffer linking to a number of on-disk buffers each limited to 2GB and mapped into memory as needed. This works fine, and I'm not sure a single 64-bit buffer would easily replace it due to file size, cache, memory and transport concerns, but it also isn't something you build in one day.

mikkelfj avatar Oct 15 '22 08:10 mikkelfj

I think 64 bit should just be the same format compiled with uoffset_t and soffset_t as 64 bit offsets. It relatively trivial to do,

I don't think we've invested into testing this very thoroughly, unfortunately. As @aardappel noted, "Given the amount of language implementations that we have, and the amount of code locations that would need touching to become 32/64 agnostic this would actually be a huuuge amount of work, and unrealistic to expect to be able to implement"

One of the problems with the approach I took is writing out the data requires knowing about all the data blobs ahead of time, so we can pre-allocate the vector at the beginning of the buffer. It also requires us going back and filling in the offsets once we know the final size of the flatbuffer.

If we're writing logs to disk in a flatbuffer format, it'd be nice if we wrote things front-to-back so we can end the file with the vector of offsets and metadata. This probably can't be integrated in the normal flatbuffer builder though, and probably would be bad for streaming settings.

Some people have binary data coming from outside of FlatBuffers that they'd like to store inside of a FlatBuffer vector, and 2GB is sometimes restrictive there too.

@aardappel I just thought about this one. How do you expect having a large flatbuffer vector to help with this? Presumably one of the big vector's elements would be a >2GB blob of external-to-flatbuffers data?

Also, I just realized in @aardappel's original proposal, there may be more than one 64bit vector, so long as all of them are at the end. That is indeed different from the solution I and @BrianHarris were suggesting which only supports one large vector.

I think we need more consensus on canonical use cases before we can shrink the design space...

CasperN avatar Oct 17 '22 18:10 CasperN

@CasperN

if we wrote things front-to-back

Yup, totally see that.. but a difficult problem to solve and likely independent of 64-bit support.

Presumably one of the big vector's elements would be a >2GB blob of external-to-flatbuffers data?

One 64-bit vector would be such a blob (as opposed to an element), see my_gigantic_buffer in the original post.

there may be more than one 64bit vector, so long as all of them are at the end

Yes, I think that is crucial for some level of flexibility in what kind of things can be implemented with this.

aardappel avatar Oct 17 '22 20:10 aardappel

I have a suggestion for a more aggressive, but IMHO also more valuable change. Replace offsets from being i32 to a special type of LEB. This special type of LEB encodes the size of the number in the first two bits of the first/leading byte. Where:

0 -> 1byte
1 -> 2bytes
2 -> 4 bytes
3 -> 8 bytes

The decoding logic looks something like following:

let b = reader.readByte()
let byteLenght = b & 3
if byteLenght == 0 {
  offset = b >> 2
} else if byteLenght == 1 {
  offset = reader.readShort() >> 2
} else if byteLenght == 2 {
  offset = reader.readInt() >> 2
} else {
  offset = reader.readLong() >> 2
}

This LEB type let us work with padding and memory alignment, as the values are always 1, 2, 4, or 8 bytes long.

The length of buffers and strings could also be encoded with such LEB.

Same for vector of scalar values.

For vector of offsets, it would be better to use the same pattern, but shift it by another 2 bits. First two bits encode the byte size of the vector values, next two bits encode the byte size of the vector length. The offsets in the vector I stored as u8/u16/u32/u64, based on the max value in the vector.

A flag in the schema could identify if the new LEB values are used throughout the buffer and not only in some special places. This change will reduce buffer size for small buffer and will make almost infinite size buffers (62 bit pointers length and 60 bit vector size) possible.

mzaks avatar Nov 06 '22 14:11 mzaks

Not saying yes or no, but if you want to do LEB, you might want to follow QUIC https://www.rfc-editor.org/rfc/rfc9000.html#integer-encoding I think it is roughly as you suggest, but endian concerns might have to be addressed.

mikkelfj avatar Nov 06 '22 17:11 mikkelfj

Not saying yes or no, but if you want to do LEB, you might want to follow QUIC https://www.rfc-editor.org/rfc/rfc9000.html#integer-encoding I think it is roughly as you suggest, but endian concerns might have to be addressed.

Ah right, QUIC was my initial inspiration. I think in case of FlatBuffers where we use little endian representation explicitly, my suggestion is a bit more ergonomic. As we consume the bytes left to right, the byte size encoding should be in the first (less significant) byte. And encoding the byte size in the left side of the bit pattern of the first byte, will make the reading part of the value more cumbersome. Please let me know if I am mistaken.

mzaks avatar Nov 07 '22 14:11 mzaks

I'm a big fan of variable length encodings in general, but the problem is that pretty much all such formats, no matter how clever, as still significantly slower than a simple 32/64-bit read, to the point that it would not be suitable for FlatBuffers.

It be great for a FlatBuffers-like format that was optimized for size but still random access, but not for all offsets in FlatBuffers.

FlexBuffers of course has them, and their I tried to mitigate some of the inefficiency of decoding the payload size in every value by only doing it once for every vector, but even there the problem is that there is just no fast CPU instructions for reading a variable length integer with a dynamic size, beyond movsb on some platforms: https://github.com/google/flatbuffers/blob/master/include/flatbuffers/flexbuffers.h#L164 Would need some benchmarking to see if that can be made fast on all platforms (does ARM have something similar?).

Even the slow conditional variable size offset decoding is appropriate for FlexBuffers since it is a slower to access format anyway (field access goes through a binary search!), but not so much for FlatBuffers.

On top of that of course is that the motivation for this issue (unlike the "FlatBuffers2" thread) is to extend the existing format in a backwards compatible way, which this wouldn't do.

aardappel avatar Nov 07 '22 19:11 aardappel

Actually the reading of a variable length offset can be quite efficient as we have a simplified problem on our hands.

The runtime value can be UInt64, the buffer is aligned and padded, so decoding of the value can look something like this:

let masks = [0xff, 0xffff, 0xffffffff, 0xffffffffffffffff]
let offset = reader.readUInt64()
return (offset & masks[offset & 3]) >> 2

I also think that it would be possible to introduce the feature in a backwards compatible way. Forward compatibility is a different story though.

mzaks avatar Nov 17 '22 07:11 mzaks

I have tested variable length encodings in various contexts, including a variant of robin hood hashing where an offset needed to be stored. In general the performance is rather poor if it is in a critical loop.

mikkelfj avatar Nov 17 '22 10:11 mikkelfj

@mikkelfj most variable length encodings rely on loops and conditions for decoding. My proposal is branch less. It is still definitely slower than just direct four byte read, but as performance penalties go, it should be not too bad. Having smaller length and offset values will produce smaller buffer and increase data locality, which in term might speed up overall performance. That sad, it is all theoretical until someone actually implements and profiles it.

mzaks avatar Nov 17 '22 15:11 mzaks

but as performance penalties go, it should be not too bad.

All I say is test it. I was surprised. I scrapped an entire optimized algorithm design and implementation because it wouldn't fly. It goes from sub-ns to 10 ns or so depending.

I another test I found that nested branching is much faster than lookup if you have a choice.

mikkelfj avatar Nov 17 '22 17:11 mikkelfj

Totally agree regarding testing.

And I am also not so sure regarding lookup performance.

That sad, the mask can also be computed purely arithmetically:

mask = 0xffffffffffffffff >> ((8 - (1 << (offset & 3))) << 3)

So one can chose between branching, lookup and arithmetics.

mzaks avatar Nov 18 '22 06:11 mzaks

Note that I also have concerns regarding compatibility and implementation effort, but I'm focusing on the technical / performance aspect in the above, given that this is what is primarily differentiating flatbuffers from msgpack and protocol buffers.

mikkelfj avatar Nov 18 '22 08:11 mikkelfj

There is also a technical concern regarding unaligned access, and reading "too much ahead" which could be necessary for performance. In other contexts, when parsing, I often add some zero bytes at the end of the buffer for reading ahead, but this isn't really practical here.

In terms of encoding, I followed and participated in QUIC v1 design and the change to using a single uniform VLE representation greatly simplified many aspects compared to various bit flags in many header fields that were in the early design. But here you also have to carefully deal with buffer overrun and unaligned access.

mikkelfj avatar Nov 18 '22 08:11 mikkelfj

Conditional assignment and arithmetic would usually beat any alternative unless the overhead is excessive. Sequential dependency chains like a = 1 | x; b = a >> 1 | 1; c = b >> 1 | 1; ... (just random pointless logic) are typically faster than alternatives. This is used in integer log2 algorithms.

C does not have (branchless) conditional assignment, but a = x ? 1 : a; will typically optimize to a cmov like instruction.

mikkelfj avatar Nov 18 '22 08:11 mikkelfj

I would like to prioritize this to be the top feature to work on for 2023.

I think the ideal end state would be what @mikkelfj suggests: uoffset_t and soffset_t as 64 bit offsets, with additional types for strings/vectors size parameters to support large sizes as well. This would be the most flatbuffer-like implementation and mostly agnostic to the end user (since these are internal details). We could even explore the variable length encoding with this as @mzaks brings up, if we are concerned with buffer-bloat due to unused bits in oversized offsets. Though, I see variable encoding as a potential optimization that isn't strictly needed. This would involve the most changes to everything and would take a lot of work to do correctly.

However, that being said, @aardappel OP design seems straightforward and does avoid many complications with the nested_flatbuffers. Though this approach does involve collusion of the end user to be explicit in defining their schema for larger buffer support, as well as serializing the data in the correct order. I do like the option that non-supported languages can simply skip these 64-bit aware fields and still provide some level of compatibility.

Lastly, we have @CasperN and @BrianHarris idea of implementing this as a library, with some sort of meta-format file that manages multiple independent flatbuffers. This might have the least amount of changes necessary, but also in the least flexible and most rigid of the three proposals.

Things to think about that I am just laying out.

  1. Every vtable has a uint16_t "size of the referring table" field that is unused by all implementations. Can we use this to encode some information we can use to dynamic switch from 32/64-bit mode?
  2. Do we intend to make Tables support offsets larger than 2^16-1 bits? Currently the vtable uses VOffset16 to indicate the field offset relative to the table start offset.
  3. The first 4 bytes of all flatbuffers is the UOffset32 to the root table, however, it is impossible to have this point to an offset at 0xFFFFFFFF, since Tables require their first 4 bytes be the offset to the vtable (SOffset32). That leaves the two upper bits as always set to 0 (for valid, non-malicious buffers). Can we use these bits as a way to version the flatbuffer to switch between 32/64-bit modes?

dbaileychess avatar Dec 22 '22 21:12 dbaileychess

The first 4 bytes of all flatbuffers is the UOffset32 to the root table, however, it is impossible to have this point to an offset at 0xFFFFFFFF, since Tables require their first 4 bytes be the offset to the vtable (SOffset32). That leaves the two upper bits as always set to 0 (for valid, non-malicious buffers). Can we use these bits as a way to version the flatbuffer to switch between 32/64-bit modes?

I think we could use this as an escape, but not for deciding the final format. I think we need an extra field for that. Also, the ambiguity regarding presence or absence of file identifiers is a bit of a nuisance. So I suggest we make that more explicit. With the extended format identifier we could have a format for the classical FB 32 bit format which would also be the default without a format identifier. Also, this would be a good place to enable StreamBuffers if they ever are to exist.

EDIT: the optional size prefix is also a point of confusion.

This leads to:

Do we intend to make Tables support offsets larger than 2^16-1 bits? Currently the vtable uses VOffset16 to indicate the field offset relative to the table start offset.

I think this shouldn't be a big problem, but large vtables could be a problem, also for CPU cache. However, if we have a format identifier, we could sneak in support for this if we really need to. I.e. I think this is an orthogonal problem we do not need to address at the same time.

Every vtable has a uint16_t "size of the referring table" field that is unused by all implementations. Can we use this to encode some information we can use to dynamic switch from 32/64-bit mode?

I have a long running idea of a) extending flatbuffers with Mix-In inheritance. There might be some optimizations that could use that field. b) Proper NULL values could be encoded with an optional bitmap that could be encoded there. In general such options (I know that proposal isn't popular, but it is an example). In general we could have flags to encode certain encoding extensions. It could also be used for dynamically extending the representation as you suggest. In either case I'm not really in favor of conflating this with 64-bit global format - both due to the complexity of it, and for performance reasons in the common case. But I do like the idea of reserving this field for flags of some sort. Again, this could be enabled via my proposed format identifier such the vtable field is only special in special FB formats.

Finally, it is worth noting that the Arrow format also exists. It uses FB for metadata but a Google Drill like mix of row/column data that is better suitable for more homogenous (but potentially sparse) data in large volumes.

mikkelfj avatar Dec 22 '22 23:12 mikkelfj