flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

Binary schema compilation results different on Windows

Open relativityspace-jsmith opened this issue 3 weeks ago • 2 comments

Working on #8008 , I found that flatc compiles monster.fbs into a different binary schema file on Windows than other platforms.

Here's the Windows version: monster-windows.bfbs.zip

And here's the normal version: https://github.com/google/flatbuffers/blob/master/samples/monster.bfbs

This issue seems like it has never been noticed till now because the CMake build system for flatc did not regenerate monster.bfbs, so the check for whether any generated files are different after building has not been catching this.

relativityspace-jsmith avatar Dec 02 '25 02:12 relativityspace-jsmith

I took a moment to annotate these two bfbs files and have attached the outputs.

it seems that windows is possibly doing a better job of vtable consolidation? I haven't had the change to do a full analysis yet.

@aardappel or @dbaileychess, you guys will probably find this difference interesting.

monster.txt monster-windows.txt

jtdavis777 avatar Dec 03 '25 04:12 jtdavis777

It seems the original has more fields present

Image

jtdavis777 avatar Dec 03 '25 04:12 jtdavis777

There is only a single difference in that diff: the attributes field present or not. The other fields are simply there because it saves the vtable until the last non-0 value.

They both have the same vtable apart from that one field.

This is the code that serializes that field:

static flatbuffers::Offset<
    flatbuffers::Vector<flatbuffers::Offset<reflection::KeyValue>>>
SerializeAttributesCommon(const SymbolTable<Value>& attributes,
                          FlatBufferBuilder* builder, const Parser& parser) {
  std::vector<flatbuffers::Offset<reflection::KeyValue>> attrs;
  for (auto kv = attributes.dict.begin(); kv != attributes.dict.end(); ++kv) {
    auto it = parser.known_attributes_.find(kv->first);
    FLATBUFFERS_ASSERT(it != parser.known_attributes_.end());
    if (parser.opts.binary_schema_builtins || !it->second) {
      auto key = builder->CreateString(kv->first);
      auto val = builder->CreateString(kv->second->constant);
      attrs.push_back(reflection::CreateKeyValue(*builder, key, val));
    }
  }
  if (attrs.size()) {
    return builder->CreateVectorOfSortedTables(&attrs);
  } else {
    return 0;
  }
}

I wouldn't be surprised if the two platforms have different values for parser.opts.binary_schema_builtins causes this.

aardappel avatar Dec 04 '25 05:12 aardappel

Linux encodes the 'deprecated' field, windows does not.

table (reflection.Field):
  +0x04E4 | 18 00 00 00             | SOffset32  | 0x00000018 (24) Loc: 0x04CC    | offset to vtable
  +0x04E8 | 00 00 00                | uint8_t[3] | ...                            | padding
  +0x04EB | 01                      | uint8_t    | 0x01 (1)                       | table field `deprecated` (Bool)
  +0x04EC | 04 00                   | uint16_t   | 0x0004 (4)                     | table field `id` (UShort)
  +0x04EE | 0C 00                   | uint16_t   | 0x000C (12)                    | table field `offset` (UShort)
  +0x04F0 | 48 00 00 00             | UOffset32  | 0x00000048 (72) Loc: 0x0538    | offset to field `name` (string)
  +0x04F4 | 34 00 00 00             | UOffset32  | 0x00000034 (52) Loc: 0x0528    | offset to field `type` (table)
  +0x04F8 | 04 00 00 00             | UOffset32  | 0x00000004 (4) Loc: 0x04FC     | offset to field `attributes` (vector)

vector (reflection.Field.attributes):
  +0x04FC | 01 00 00 00             | uint32_t   | 0x00000001 (1)                 | length of vector (# items)
  +0x0500 | 04 00 00 00             | UOffset32  | 0x00000004 (4) Loc: 0x0504     | offset to table[0]

table (reflection.KeyValue):
  +0x0504 | DC FD FF FF             | SOffset32  | 0xFFFFFDDC (-548) Loc: 0x0728  | offset to vtable
  +0x0508 | 10 00 00 00             | UOffset32  | 0x00000010 (16) Loc: 0x0518    | offset to field `key` (string)
  +0x050C | 04 00 00 00             | UOffset32  | 0x00000004 (4) Loc: 0x0510     | offset to field `value` (string)

string (reflection.KeyValue.value):
  +0x0510 | 01 00 00 00             | uint32_t   | 0x00000001 (1)                 | length of string
  +0x0514 | 30                      | char[1]    | 0                              | string literal
  +0x0515 | 00                      | char       | 0x00 (0)                       | string terminator

padding:
  +0x0516 | 00 00                   | uint8_t[2] | ..                             | padding

string (reflection.KeyValue.key):
  +0x0518 | 0A 00 00 00             | uint32_t   | 0x0000000A (10)                | length of string
  +0x051C | 64 65 70 72 65 63 61 74 | char[10]   | deprecat                       | string literal
  +0x0524 | 65 64                   |            | ed
  +0x0526 | 00                      | char       | 0x00 (0)                       | string terminator

jtdavis777 avatar Dec 04 '25 14:12 jtdavis777

Yes, deprecated is a built-in / recognized attribute. It's status is already encoded in the deprecated field in reflection::Field. Thus, Windows does the right thing, parser.opts.binary_schema_builtins is at the default false. Linux apparently uses --bfbs-builtins to turn it to true, which it shouldn't do.

aardappel avatar Dec 05 '25 06:12 aardappel

confirmed on linux and windows that I get a matching file, containing the builtin attribute, with ./build/flatc -b --schema --bfbs-comments --bfbs-builtins samples/monster.fbs -- which is the list of options present in generate_code.py -

BINARY_OPTS = ["-b", "--schema", "--bfbs-comments", "--bfbs-builtins"]

I'm not sure where the original issue discrepancy is coming from just yet.

jtdavis777 avatar Dec 06 '25 00:12 jtdavis777

@relativityspace-jsmith I think I've found (part of) your issue -- in your PR you generate monster.bfbs in cmake here

  flatbuffers_generate_headers(TARGET flatsample
      SCHEMAS samples/monster.fbs
      BINARY_SCHEMAS_DIR samples/
      FLAGS ${FLATC_OPT_COMP})

however, the bfbs file that is checked in is generated with python3 scripts/generate_code.py -- these two commands have different options passed to them :)

jtdavis777 avatar Dec 06 '25 00:12 jtdavis777

I think you may run into an actual "real" discrimination between the two -- windows and linux paths are added to comments differently.

jtdavis777 avatar Dec 06 '25 00:12 jtdavis777