flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

Make property renaming optional

Open TJKoury opened this issue 2 years ago • 48 comments

I am integrating Flatbuffers with existing data standards, it is mandatory to use the exact property names / namespaces contained in these standards going forward, including all capitalization.

I've hacked in a quick fix to remove MakeCamel, MakeScreamingCamel, and MakeCamelCase, as well as disabling this warning. I can work on a pull that puts disabling these behind a flag, for users in my situation, if it is something that would be accepted.

TJKoury avatar Feb 20 '22 14:02 TJKoury

I would be open to this, but not as a flag, but an attribute in the schema file itself. Something like

SomeSchema.fbs

attribute "keep_naming"

table A {
soMeFunkyNam_ing: int (keep_naming)
}

This way the schema is explicitly opting out of the style guide and clearly states that.

If it was a flag option, I think it would be confusing.

dbaileychess avatar Feb 22 '22 04:02 dbaileychess

That is an interesting option, and makes the implementation more portable.

TJKoury avatar Feb 22 '22 09:02 TJKoury

I like the idea of an attribute, but I guess I'd need to decorate every struct/table/enum with this (hopefully not every field!). Are file or namespace-scope attributes possible?

Bklyn avatar Feb 24 '22 02:02 Bklyn

I think he’s talking about a schema-level flag, at the top of the file, stating that everything within it should not be altered. This is good, since it improves portability: if I give you a schema, you should be able to reproduce the code without knowing what flags I used.

TJKoury avatar Feb 24 '22 05:02 TJKoury

Wait, why are you closing this issue? Now there's no open issue for this feature request AFAIK since I closed #7128.

Attributes can be applied to fields or types (i.e. struct, table, enum). I don't think they can be attached to a schema presently, but I agree this would be the most useful scope for using this approach.

Bklyn avatar Feb 24 '22 13:02 Bklyn

I agree, let’s keep it open and flesh out an implementation.

TJKoury avatar Feb 24 '22 15:02 TJKoury

Our parser does handle file-level attributes, but that is mostly for use in reflection as the code generators won't know what to do with it. I would probably want to leave that as is.

We also have a special native_include identifier that is top-level, so we could do somethin similar if we had too.

I am thinking we should introduce a pragma concept, as a way to tell the flatc compiler how to generate the code. Thus this feature would be something like:

-- some.fbs

pragma keep_identifier_naming

table inTerestingNaMing {
  s-o-m-e-ODDBALLfield:int;
}

And everything under the pragma (lexical scope) would use the specified naming and not modified it at all.

Individual generators would be free to ignore the pragma command, that way we don't have to implement this feature for all languages at once.

We would also probably need to add a new attribute to override this, something like: default-naming

-- some.fbs

pragma keep_identifier_naming

table inTerestingNaMing {
  s-o-m-e-ODDBALLfield:int (default-naming);
}

And the field tagged with default-naming will always use the language specified default.

Thoughts?

dbaileychess avatar Feb 25 '22 05:02 dbaileychess

Thoughts?

Sounds good to me.

Bklyn avatar Feb 25 '22 15:02 Bklyn

I like it, though the “default naming” might be overkill / confusing. I envision this pragma to be used for compatibility purposes like my use case, tagging individual fields might lead to some confusion if the override pragma is missing.

TJKoury avatar Feb 25 '22 16:02 TJKoury

IMO, it would be better to just not support global "keep_naming" + opt out logic, since its such a niche use case... its not too hard for a minority of users to type (keep_naming) everywhere they need.

If we do go ahead with pragmas and stuff, the parser should contain this logic so that from the code generators' perspective, its as if the user typed (keep_naming) on all fields.

I also wonder if this feature will expand, maybe people would want to (keep_naming) for some languages and not others? Maybe they'll want to override our ideas of when to use CamelCase vs snake_case... I doubt fine grained configurability is worth the complexity, but if it turns out to be, we should do a little upfront design.

CasperN avatar Feb 25 '22 19:02 CasperN

In my opinion, having different conventions per-language sort of defeats the interoperability; language agnostic inputs / outputs like the IDL and the typed JSON Schema output should reflect what you expect no matter what language you use. Otherwise you are stuck trying to translate field names according to a rule set that’s only available inside the binary.

TJKoury avatar Feb 25 '22 19:02 TJKoury

In my opinion, having different conventions per-language sort of defeats the interoperability; language agnostic inputs / outputs like the IDL and the typed JSON Schema output should reflect what you expect no matter what language you use. Otherwise you are stuck trying to translate field names according to a rule set that’s only available inside the binary.

I disagree. The schema is the interoperable part, the individual languages should conform to their own conventions. You don't really have to translate field names with some obscure rules; it just simple formatting changes that should be self-evident.

IMO, it would be better to just not support global "keep_naming" + opt out logic, since its such a niche use case... its not too hard for a minority of users to type (keep_naming) everywhere they need.

That is also fine with me, since it's the easiest thing to do and rationalize.

dbaileychess avatar Feb 25 '22 20:02 dbaileychess

The global “keep_naming” is good, having to do it for each field would be a nightmare, I have over 4000 fields I need to support.

TJKoury avatar Feb 25 '22 20:02 TJKoury

@dbaileychess:

I disagree. The schema is the interoperable part, the individual languages should conform to their own conventions. You don't really have to translate field names with some obscure rules; it just simple formatting changes that should be self-evident.

They are simple rules, but this is extra cognitive load that developers shouldn't need to shoulder - they've got enough other stuff to worry about. My argument would be why not honor the schema as-written to the degree possible, at least optionally? I'd like to be able to refer to the nice, small schema definitions when writing code that interfaces with a particular message, and not the generated code to be sure what a field is called. And I'd like this to be consistent across as many languages as possible.

I agree with @TJKoury that having to decorate individual fields for this would be a nightmare. What if it were per-object (enum, struct, table)? This would be less bad, but I would prefer a file-scope setting.

This is largely water under the bridge, but having the code generator mangle names by default strikes me as wholly unnecessary. Are there actually languages which would forbid me from naming a structures with snake_case or fields in a class with CamelCase? If there are, some mangling seems unavoidable, but enforcing it by default, and inconsistently across all languages with no simple opt-out seems like is a choice that is worth revisiting.

Bklyn avatar Feb 25 '22 22:02 Bklyn

I think I am not following what this is needed for yet.

If this is largely about the Python generated code not following the language standards, we should just fix that.

Can someone explain in what context "data standards" would require generated APIs for all languages to ignore language standards and all look uniform? That sounds very niche and generally undesirable. To me, it seems pretty important to use a language's code standards, being able to write the same names across languages seems of much smaller importance to me.

There are lots of code generators out there.. is it common for any of them to allow overriding the language standard? Does Protobuf have an option for it?

If we must have this option, given that is so niche, to me a flatc flag makes more sense than trying to introduce schema-wide attributes (which is also problematic in terms of scoping). Also note, the current attribute keyword declares allowed attributes, it does not set them to any value globally. If for whatever reason we decide we must make it a schema attribute instead, I'd recommend simply making it a table attribute that applies to all fields, to save on typing, rather than "global".

aardappel avatar Feb 25 '22 22:02 aardappel

@aardappel With utmost respect, there is no such thing as universal “language standards”. There are language conventions, and those are at best contested, and not enforced except in some extreme edge cases by any compiler or interpreter.

On the other hand, we gigabytes of currently written code running in prod on mission critical systems that refer to the exact properties referenced in that IDL I have written.

Having the names unchanged is the difference between being able to use the generated code or not.

Congrats on the move BTW, excited to see what you are doing in your new venture!

TJKoury avatar Feb 25 '22 22:02 TJKoury

I'd disagree, with the exception of C/C++, all languages supported by FlatBuffers have a very specific standards for their identifiers, defined by the language definition or community. They are not always enforced, but using anything other than these standards would be very unwelcome by almost all users of these languages.

aardappel avatar Feb 25 '22 23:02 aardappel

I understand your perspective, which is why I think adding an opt-in pragma is the way to go, as it does not force anyone to change anything, no surprises for current users and no limitations for "special cases". It also allows flexibility for things that are "not always enforced" by "language definitions" or "community".

TJKoury avatar Feb 26 '22 00:02 TJKoury

Thanks for the discussion. I think I am leaning to one of two options:

  1. If you really want the naming to be applied globally, it might be best to do it as a flatc flag option. It would be similar to cpp-field-case-style, but obviously language agnostic.

  2. If you want more control, we could add a new attribute to the schema file that could be applied at the field- or table- level. I think we should hold off on the pragma idea for now, as I think that is adding too much machinery at the moment, and as @CasperN points out is for a niche feature. I think this is a fine compromise at the moment.

Let me know your thoughts.

dbaileychess avatar Feb 28 '22 16:02 dbaileychess

I have a slight preference for an attribute over a flag, but its not very strong.

I'm mostly concerned with maintainability.

A naive implementation would touch almost everywhere and be a nightmare to maintain. Imo, the "right" way of doing this would involve first refactoring every code generator. Everywhere where we interact with the name of a type, we should instead interact with an accessor class that can apply the local language policy. This class can also look up either a flag value or a magic attribute to override the policy. It would still touch every code generator but at least be more maintainable.

class Namer {
  // Config varies per language.
  struct Config {
    types: CapitalCamelCase;
    methods: SnakeCase;
    constants: ScreamingSnakeCase;
    // ... etc
  };
  Namer(Config);

  std::string Type(Definition&) {
    // Check for magic attribute or flag
    // apply Config
  }
  std::string Method(Definition&);
  std::string Constant(Definition&);
};

CasperN avatar Feb 28 '22 21:02 CasperN

I just recently upgraded to 2.0.0 and noticed the case style warnings. I'm not a major contributor to FlatBuffers yet, but I still wanted to comment as a user.

I think it's understandable that flatc defaults to Google code style. However, so many other projects with different C++ code styles are using FlatBuffers. Especially for projects that use the object API, table/field names strongly affect the overall code style of the project.

For example, our internal C++ style guide requires camelCase C++ member names. We also heavily utilize the object API. A normal access to table looks like this in our project: FlatbufferTableT->memberOne which suits our overall code style well.

I think it would be really great if flatc just didn't care about field or table name styles, or let the user specify it.

Alternatively, a global field-case-style flatc option would be really helpful.

anthony-ozdemir avatar Feb 28 '22 22:02 anthony-ozdemir

The flatc flag would probably be the easiest to implement technically, and transparent to current users. The issue as mentioned above is portability; two users would create different codebases depending on their flags. Adding a schema-wide, top level attribute makes it portable.

TJKoury avatar Feb 28 '22 23:02 TJKoury

A flatc flag has the additional advantage of allowing users to choose which languages to apply this to. There may be users that have use cases for overriding one particular language but not the others. They would not be served by having this feature as attributes.

aardappel avatar Feb 28 '22 23:02 aardappel

OK, sounds like a new flag would be ideal. I will deprecate the cpp-field-case-style and transition that over to a new field-case-style and object-case-style flags that can specify the intended output case.

@CasperN I did start on one part of the refactoring, making out our disparate MakeSnake MakeCamel functions use the same common ConvertCase. So your intended refactoring could build off of that.

dbaileychess avatar Mar 01 '22 01:03 dbaileychess

I will deprecate the cpp-field-case-style and transition that over to a new field-case-style and object-case-style flags that can specify the intended output case.

I think we still need to dive deeper into design

By choosing global overrides, we're telling users with weird casing preferences to invoke flatc one language at a time. This is fine by me but we should explicitly acknowledge the choice. We don't need to support per-field overrides right?

In addition to fields and objects, there are methods, constants, functions, enums, namespaces/modules/packages, and filenames. Maybe even macros and exceptions too, though I don't know any present usage. Presumably we'd eventually grow a flag for all of these?

(bike shedding: I'd prefer the term "type" over "object" since we have an "object API" and not all languages are object oriented)

What if these categories aren't fine grained enough for some style guides, e,.g. are there styles with different casing for global constants and class constants?

What are the arguments to these flags? is it

  • "" (Unspecified: the local code generator chooses)
  • keep (uses whatever is in the schema)
  • kCamelCase
  • smallCamelCase
  • CapitalCamelCase
  • snake_case
  • SCREAMING_SNAKE_CASE

CasperN avatar Mar 01 '22 17:03 CasperN

My recommendation is to simply have a preserve-property-name flag. No need for that level of granularity. Any user can simply follow their own conventions, whether or not those conventions are already present in flatc.

TJKoury avatar Mar 01 '22 17:03 TJKoury

My recommendation is to simply have a preserve-property-name flag. No need for that level of granularity. Any user can simply follow their own conventions, whether or not those conventions are already present in flatc.

IMO, preserve-property-name seems too niche to justify by itself. On the other hand, non-google style guides seem a lot more prevalent. "keep casing" can be a simple specialization of the general feature.

CasperN avatar Mar 01 '22 17:03 CasperN

I agree in principle, at the same time it is a lot more work since there are infinite style guides and “best practices” to pull from. Seems like a very large task to tackle to prevent a behavior not present in other schema parser/code generators.

TJKoury avatar Mar 01 '22 18:03 TJKoury

One of the more popular code generators (protobufs) has strict styling like we do (https://developers.google.com/protocol-buffers/docs/style#message_and_field_names). @TJKoury do you have examples of other code generators that output custom identifier names so we can see how they approached that problem?

One negative of having user-custom naming in the schema file is, it is practically impossible for us to convert it back to a standard casing (i.e., going from sOm_eWeirdNamE to some_werid_name). Thus all languages that generate code from that schema are forced to use the specified naming. This is the benefit of having a standard naming in the schema, we can always convert some_werid_name into a list of tokens (some weird name) and generate any output style we want (within reason).

@TJKoury what is the casing you want to use?

dbaileychess avatar Mar 02 '22 17:03 dbaileychess

@TJKoury what is the casing you want to use?

I'm not @TJKoury but what I want is the identifiers in my schema left as-is.

Bklyn avatar Mar 02 '22 17:03 Bklyn