flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

Parsing from `proto` should keep field ID.

Open dbaileychess opened this issue 3 years ago • 2 comments

When parsing from a proto that have explicit field identifiers, flatbuffers need to keep those identifiers stable by explicitly using the id attribute. This is because protos allow new field placement anywhere in their message, even before other fields, as long as they use a new field ID.

Example:

message Foo {
  optional int32 baz = 2;
  optional int32 bar = 1;
}

dbaileychess avatar Nov 16 '22 03:11 dbaileychess

would you mind if I do it ?

enum-class avatar Nov 18 '22 03:11 enum-class

@enum-class Sure, I always welcome PRs

dbaileychess avatar Nov 18 '22 04:11 dbaileychess

Ok, I have given this a bit more thought.

We should map proto-field-ids to flatbuffer-ids in the following way:

  • Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
  • Generate a flatbuffer-id for each of those fields, starting at 0 and going up consecutively
  • Be mindful of union types that have a hidden type field that needs to be placed.
  • We should (if possible) emit the flatbuffer fields in the same order that the proto fields are ordered within the file, disregarding the field-id. This keeps the logical grouping of fields consistent.

We will support gaps in proto-field-ids by just ignoring the skipped values.

For conformity-sake, if someone adds a proto-field-id in an existing gap, flatc will generate a valid .fbs schema with the above rules. However, this schema won't be a proper evolved schema from its prior history. The user is expected to check this with the --conform test of flatc on the two generated .fbs files. No explicit support for protos within--conform is needed; users are expected to compare the generated .fbs schema instead.

To aid the user at this possibility of non-conformity, a new flag should be added: --disallow-proto-field-gaps that defaults to true. When this flag is true, if there are any gaps in the proto-field-ids an error will be emitted and no .fbs will be generated. When this flag is false, only an warning is generated and the .fbs will be generated. (I'm open to making this flag an enum with warn, error, no-op or something, if we want more control).

@enum-class Please see above for edits to your PR.

@aardappel for any opinions.

dbaileychess avatar Dec 05 '22 18:12 dbaileychess

This generally seems ok, though I wonder if it is worth it to (by default) require the previous .fbs schema, i.e. it is effectively going to run --conform for you (with a way to disable that of course). I fear if you don't, this idea of having to run --conform separately is going to get forgotten, and people will end up with subtle bugs of old serialized data being read with an incompatible schema.

Alternatively, some clear message saying .proto converted to .fbs, please check with --conform before using this schema or similar?

Or, if you specify --proto, that it works with you specifying --conform in the same command-line, and if you don't, big shouty warning --proto ran without --conform, new .fbs not guaranteed to be compatible with past conversions! or similar.

aardappel avatar Dec 05 '22 18:12 aardappel

  • Sort all the proto-field-ids by their numerical value, and stack from low-to-high.

What about oneof (union) that does not have id ?

enum-class avatar Dec 06 '22 00:12 enum-class

--disallow-proto-field-gaps

As we sort and map the ids, what is the point of this config flag ?

enum-class avatar Dec 06 '22 00:12 enum-class

Sort all the proto-field-ids by their numerical value, and stack from low-to-high. What about oneof (union) that does not have id ?

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.

--disallow-proto-field-gaps As we sort and map the ids, what is the point of this config flag ?

This is to alert the user that they could introduce issues in the future. If you have:

field_a = 1;
field_b = 1000;

And later do

field_a = 1;
field_b = 1000;

field_c = 10

That would lead to the following flatbuffer schema:

a (id = 0)
b (id = 1)

and

a (id = 0)
c (id = 1)
b (id = 2)

Those two schema are "valid" on their own, but are not conforming to each other. So the flag is alerting the user that this potential could happen in the future.

dbaileychess avatar Dec 06 '22 00:12 dbaileychess

though I wonder if it is worth it to (by default) require the previous .fbs schema, i.e. it is effectively going to run --conform for you (with a way to disable that of course). I fear if you don't, this idea of having to run --conform separately is going to get forgotten, and people will end up with subtle bugs of old serialized data being read with an incompatible schema.

Well, on the first conversion from proto->fbs, you wouldn't have a previous schema. That could probably be fixed with a flag to explicitly state "--first-proto-conversion" or whatnot.

But, the later about forgetting to include --conform, is that we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform .

Alternatively, some clear message saying .proto converted to .fbs, please check with --conform before using this schema or similar?

Yes, but this is kind of the purpose of the --disallow-proto-field-gaps flag. We only have to warn in situations where there is a potential for making incompatible schema changes.

dbaileychess avatar Dec 06 '22 00:12 dbaileychess

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.

What do you mean by ignore them ? I mean what id should we assign to union (oneof) type ? As they do not have id I already consider them "0", so they will be assigned to the minimum possible id in the struct. Do you agree with this approach ?

enum-class avatar Dec 06 '22 01:12 enum-class

This is to alert the user that they could introduce issues in the future. If you have:

It sounds reasonable. And what should we do for reserved id ? In protobuf, if reserved id be used, it will alert. should we alert something similar ?

enum-class avatar Dec 06 '22 01:12 enum-class

we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform

I think its a lot easier to run into incompatibility issues when the a .proto is the source of truth than when a .fbs is.

aardappel avatar Dec 06 '22 06:12 aardappel

While I am firmly in favour of addressing this issue, I realize that the changes to take proto field numbers into account may actually break binary backwards compatibility for existing users of the --proto flag, by generating a .fbs schema file that doesn't conform with the .fbs schema file generated by previous versions of flatbuffers.

So I think we may need to have a --legacy-ignore-proto-field-numbers flag to preserve binary backwards compatibility in those cases?

fergushenderson avatar Dec 06 '22 11:12 fergushenderson

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing. What do you mean by ignore them ? I mean what id should we assign to union (oneof) type ? As they do not have id I already consider them "0", so they will be assigned to the minimum possible id in the struct. Do you agree with this approach ?

@enum-class

The fields within a one-of are just normal fields with their respective ids. Protobufs just ensures if that if one is set, the other fields are not-set. We don't have that mechanism in flatbuffers. So by ignoring that type, I just mean process the inner types as normal fields and disregard that they are wrapped in a oneof.

we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform I think its a lot easier to run into incompatibility issues when the a .proto is the source of truth than when a .fbs is.

@aardappel

True, i'm fine having --conform be required with --proto, as long as their is a flag to disable this.

While I am firmly in favour of addressing this issue, I realize that the changes to take proto field numbers into account may actually break binary backwards compatibility for existing users of the --proto flag, by generating a .fbs schema file that doesn't conform with the .fbs schema file generated by previous versions of flatbuffers. So I think we may need to have a --legacy-ignore-proto-field-numbers flag to preserve binary backwards compatibility in those cases?

@fergushenderson

Yeah, we have to make sure that existing users are not broken. I think the legacy flag is appropriate, though I would want to default to off.

dbaileychess avatar Dec 06 '22 18:12 dbaileychess

True, i'm fine having --conform be required with --proto, as long as their is a flag to disable this.

Rather than requiring it, with a flag to turn the error off, I suggested a clear warning (that you can ignore at your peril).

aardappel avatar Dec 06 '22 19:12 aardappel

In conclusion:

  1. Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
  2. As oneof elements in protobuf which is equal to union in flatbuffer, do not have id we assign them the least possible id to them. (Be mindful of union types that have a hidden type field that needs to be placed.)
  3. Check the assigned id in the proto file not to be used twice in a message, or not to be missed.
  4. Check the reserved id in proto not to be used
  5. Map fields id to new id that is consecutive from 0.
  6. Keep flatbuffer fields in the same order that the proto fields are ordered within the file, disregarding the field-id. This keeps the logical grouping of fields consistent.
  7. Need to add --disallow-proto-field-gaps flag (default to no-op). if there are any gaps in the ids In warning mode it will emit a warning and generate fbs and in Error mode show error messages and not generate fbs.
  8. Need to add –keep-proto-id (default to false) for backward compatibility
  9. And about --conform, I don’t know :)) ??! Maybe --conform be required with --proto

cc: @aardappel @dbaileychess @fergushenderson

enum-class avatar Dec 07 '22 04:12 enum-class