pbtools icon indicating copy to clipboard operation
pbtools copied to clipboard

Non-conformance with protolint and protoc

Open ksemelka opened this issue 3 years ago • 6 comments

Something I've been having trouble with is not seeing the same syntax enforcement between protolint, protoc, and the pbtools C code generator.

For example, the following .proto definition will cause an error enforced by protolint and protoc, but not by pbtools:

Enum value defined multiple times in the same message

syntax = "proto3";

message foo
{
  enum bar 
  {
    UNSPECIFIED = 0;
  }
  enum zoo
  {
    UNSPECIFIED = 0;
  }
}

Error:

$ protoc --python_out=.  foo.proto
foo.proto:11:5: "UNSPECIFIED" is already defined in "foo".
foo.proto:11:5: Note that enum values use C++ scoping rules, meaning that enum values are siblings of their type, not children of it.  Therefore, "UNSPECIFIED" must be unique within "foo", not just within "zoo".

ksemelka avatar Mar 26 '21 08:03 ksemelka

Related follow-up to this:

The Protobuf style guide recommends prepending the enum type name to each enum value name. Protobuf Style Guide: Enums

For example:

syntax = "proto3";

message foo
{
  enum Bar 
  {
    BAR_UNSPECIFIED = 0;
  }
  Bar bar = 1;

  enum Zoo
  {
    ZOO_UNSPECIFIED = 0;
  }
  Zoo zoo = 2;
}

However this generates duplicate naming in the enum types which, as you can imagine, get really long if you have long enum type names.

/**
 * Enum foo.Bar.
 */
enum foo_bar_e {
    foo_bar_bar_unspecified_e = 0
};

/**
 * Enum foo.Zoo.
 */
enum foo_zoo_e {
    foo_zoo_zoo_unspecified_e = 0
};

I would imagine changing the project at this point to only generate the C enum value names from the .proto enum value names instead of both the enum type and value names might be too large of a change, but thought it would be good to mention this anyway since it's related to the original issue above.

I would like to follow the style guide in this way to prevent the issues with protolint and protoc not compiling my .proto files which have non-unique enum value names in them, but my C enums just come out way too lengthy and redundant to make me want to do it this way. Here's my actual usage showing how the verbosity of the enum value name generation can quickly get out of hand (61 characters for the longest one):

message RuntimeConfigs
{
  enum PowertrainType
  {
    POWERTRAIN_TYPE_UNSPECIFIED = 0;
    POWERTRAIN_TYPE_AWD = 1;
    POWERTRAIN_TYPE_FWD = 2;
  }
  PowertrainType powertrain_type = 1000;

  enum WheelType
  {
    WHEEL_TYPE_UNSPECIFIED = 0;
    WHEEL_TYPE_20_INCH = 1;
    WHEEL_TYPE_21_INCH = 2;
    WHEEL_TYPE_22_INCH = 3;
  }
  WheelType wheel_type = 1001;
}
/**
 * Enum RuntimeConfigs.PowertrainType.
 */
enum runtime_configs_powertrain_type_e {
    runtime_configs_powertrain_type_powertrain_type_unspecified_e = 0,
    runtime_configs_powertrain_type_powertrain_type_awd_e = 1,
    runtime_configs_powertrain_type_powertrain_type_fwd_e = 2
};

/**
 * Enum RuntimeConfigs.WheelType.
 */
enum runtime_configs_wheel_type_e {
    runtime_configs_wheel_type_wheel_type_unspecified_e = 0,
    runtime_configs_wheel_type_wheel_type_20_inch_e = 1,
    runtime_configs_wheel_type_wheel_type_21_inch_e = 2,
    runtime_configs_wheel_type_wheel_type_22_inch_e = 3
};

ksemelka avatar Mar 26 '21 08:03 ksemelka

Feel free to suggest an alternative way of generating names that are unique.

eerimoq avatar Mar 26 '21 09:03 eerimoq

pbtools does not aim to warn about all protubuf naming conventions. Other tools, like protoc and protolint that you mentioned, does this job well already so I didn't prioritize to make another tool with extensive checks.

eerimoq avatar Mar 26 '21 09:03 eerimoq

My proposal for shortening enum value name generation is to add a custom option which does not include the enum type name in the C enum generation. nanopb does this by adding a custom option, long_names, which is enabled by default. We could add a similar custom option to pbtools, although it would allow generating non-unique names unless there is a compiler check added to make sure no two enum value names in a message are the same.

Proposed option usage in a .proto file:

option long_names = false;    // Do not prefix the enum name to the enum value in definitions, i.e. enum_name_enum_value.

I think enforcing that all enum values in the same message are unique would be a positive step forward for the project, as using protolint alongside pbtools would be better supported. Compatibility between tools in a developer ecosystem can make or break a project's viability in a production environment. protoc is the compiler officially supported by Google. Shouldn't protoc be a benchmark for pbtools conformity to the Protobuf standard?

ksemelka avatar Mar 26 '21 09:03 ksemelka

Feel free to implement that option. My guess is that's it's easier to add an option to the command line tool than using protobuf options. It would also be nice with shorter struct, union and function names.

Normally both protoc and pbtools are used to generate code from the same proto-files (say protoc for server side and pbtools for client). That means protoc will catch all naming errors, and pbtools can ignore making such checks. Of course, if someone want to add more checks to pbtools I'm open to do so. Being complient to spec is good obviously.

eerimoq avatar Mar 26 '21 09:03 eerimoq

Sure, that makes sense, thanks. I’ll leave this issue open until I get around to making a PR for the option and extra syntax check.

ksemelka avatar Mar 26 '21 09:03 ksemelka