protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Message FieldNameEntry not allowed as map value type for field_name

Open ergl opened this issue 3 years ago • 22 comments

What version of protobuf and what language are you using? Version: stable 3.19.4 Language: N/A

What operating system (Linux, Windows, ...) and version?

macOS 11.6.4

What runtime / compiler are you using (e.g., python version or gcc version)

N/A, happens before language-specifc code generation

What did you do?

Steps to reproduce the behavior:

  1. Attempt to use a message with a name of FieldNameEntry as a map value in field field_name

  2. See protoc return the error: map_entry should not be set explicitly. Use map<KeyType, ValueType> instead.

Minimal declaration:

syntax = "proto3"; // also happens on proto2

message AEntry {}

message Container {
    map<uint32, AEntry> a = 1;
}

What did you expect to see

I expected the declaration above to compile cleanly and with no errors.

What did you see instead?

The compiler emits map_entry should not be set explicitly. Use map<KeyType, ValueType> instead. before reaching any language-specific generator.

It took me a bit to realize this has to do with backwards compatibility between maps and repeated values, but the error message did not make sense to me at first, and I did not expect protoc to forbid this usage.

If this is an expected error, perhaps it should be documented (apologies if it is already, and I didn't see it)

ergl avatar Feb 17 '22 15:02 ergl

Interesting. Do you have a complete reproducible test case, ideally as a failing PR here? I don't immediately see how the code corresponds to the report.

elharo avatar Feb 22 '22 14:02 elharo

@elharo I don't have a failing PR at hand, sorry, but take the following file (let's say, test.proto)

syntax = "proto3";

message FooEntry {}

message Container {
    map<uint32, FooEntry> foo = 1;
}

If we try to generate code with protoc -I=. --python_out . test.proto, it will error:

# protoc -I=. --python_out . test.proto
test.proto: map_entry should not be set explicitly. Use map<KeyType, ValueType> instead.

I put --python_out above but any other language would work.

I don't immediately see how the code corresponds to the report.

Sorry for any confusion. What I mean is that if a field named foo (or any other name) has a type like map<_, FooEntry> (notice the matching name between the field and the map value type), then this error will be raised.

In my original example, the field is named a and the type is named AEntry. In the above example, the field is named foo and the type is FooEntry.

For snake_case fields, like field_name, the problematic type would need to be named the same, but transformed to CamelCase (FieldNameEntry).

ergl avatar Feb 22 '22 15:02 ergl

The relevant checks happen here:

https://github.com/protocolbuffers/protobuf/blob/2f91da585e96a7efe43505f714f03c7716a94ecb/src/google/protobuf/descriptor.cc#L6905-L6906

ergl avatar Feb 22 '22 15:02 ergl

Thanks for the additional info. Spec say, "The value_type can be any type except another map." so this should be OK. Assuming I can reproduce this, it's probably a bug.

elharo avatar Feb 22 '22 21:02 elharo

The naming convention reserves *Entry for internal usage: https://github.com/protocolbuffers/protobuf/blob/2f91da585e96a7efe43505f714f03c7716a94ecb/src/google/protobuf/descriptor.cc#L6917 Closing this as this is WAD

esorot avatar Jun 14 '22 18:06 esorot

It appears that *Entry is a reserved field name. In the interim I'll add workaround in my local code to rename the field to *Entry_ . Please could we re-open this issue? It would be great to have a more elegant solution.

parthea avatar Jul 05 '22 11:07 parthea

I agree. This isn't documented in the spec and protoc should generate name mangling as needed to make this work.

elharo avatar Jul 05 '22 12:07 elharo

Moving the *Entry message definition inside the message that contains the map field gives a more clarifying error message.

syntax = "proto3";

message Container {
    message FooEntry {}
    map<uint32, FooEntry> foo = 1;
}

Trying to compile the file above prints the following error:

# protoc -I=. --python_out . test.proto
test.proto: "FooEntry" is already defined in "Container".
test.proto:3:9: Expanded map entry type FooEntry conflicts with an existing nested message type.

ergl avatar Jul 05 '22 13:07 ergl

At some point, we need to have a much more systematic review of how we handle keywords and conflicting names. This is another piece of that much larger puzzle

fowles avatar Jul 06 '22 12:07 fowles

I assume we want a documentation bandaid in the meantime? I can add something to the proto and proto3 topics.

Logofile avatar Jul 06 '22 13:07 Logofile

Yeah

fowles avatar Jul 06 '22 13:07 fowles

This isn't documented in the spec and protoc should generate name mangling as needed to make this work.

@elharo, no name mangling is even needed in this case. The generated entry would be nested inside the message, Container.FooEntry in the given example. But the message name in question is simply FooEntry. So the resulting descriptors from the example should behave "as if":

syntax = "proto3";

message FooEntry {}

message Container {
    message FooEntry {
        option map_entry = true;
        uint32 key = 1;
        .FooEntry value = 2;
    }
    repeated .Container.FooEntry foo = 1;
}

I haven't dug into the code, but I'm going to guess that the DescriptorProto.options.map_entry field was incorrectly set to true on the .FooEntry message, just based on the type's name (ending in Entry). But that seems easy to fix: it should instead only set the map_entry option for messages that are synthesized from map fields, not for all messages whose name ends in Entry.

jhump avatar Aug 15 '22 17:08 jhump

The problem here is that protoc is generating a message named FooEntry for the map field foo and that conflicts with your user-defined message FooEntry. protoc is not adding the map_entry option to your type based on the name.

Name mangling needs to be applied to the generated map entry type to allow a nested FooEntry type and a map<> foo to coexist.

googleberg avatar Aug 15 '22 17:08 googleberg

@googleberg, that does not make sense. There is no conflict because they are in different namespaces. Map entry messages are generated as nested inside the message that declares the map field. The user-defined type is not a nested type, so there is no conflict. For example, this is perfectly valid -- the two Foo messages do not conflict:

syntax = "proto3";

message Foo {}

message Container {
  message Foo {}
  repeated Foo foo = 1;
}

Also the error message tells the user not to use the map_entry option explicitly. But it is not used explicitly. Which implies it's being set somewhere else:

test.proto: map_entry should not be set explicitly. Use map<KeyType, ValueType> instead.

In the example above, the user did use the suggested map<KeyType, ValueType> formulation.

jhump avatar Aug 15 '22 17:08 jhump

Okay, I see the problem. Indeed, no option is being set based on the name.

But it's also not due to a conflict. The messages (user-defined one and synthetic one to represent map entry) co-exist just fine.

The issue is that when the reference to FooEntry in the map type is resolved, it is resolving to the generated message, not the user-defined one.,

So simply changing the reference to explicitly indicate the top-level message fixes this. So the following works:

syntax = "proto3";

message FooEntry {}

message Container {
    map<uint32, .FooEntry> foo = 1;
}

@ergl, take a look at how to resolve in your proto file ^^

In case it's not obvious, the change in the above vs. the original example is force a fully-qualified reference via a leading dot:

7c7
<   map<string, FooEntry> foo = 1;
---
>   map<string, .FooEntry> foo = 1;

jhump avatar Aug 15 '22 18:08 jhump

@jhump my apologies for not carefully reading the initial example.

You are entirely correct that the two message types are in different namespaces.

The problem is that the lookup is occurring from the nested scope. So the first candidate for AEntry that is found is the generated map entry type.

This can be solved by adding package qualifiers on the value type in the map field declaration.

For the example above add a '.' to get the global namespace message AEntry like this:

syntax = "proto3"; // also happens on proto2

message AEntry {}

message Container {
    map<uint32, .AEntry> a = 1;
}

Or, in an example with a declared package:

syntax = "proto3"; // also happens on proto2

package pkg;

message AEntry {}

message Container {
    map<uint32, pkg.AEntry> a = 1;
}

I would argue that this is very subtle and users shouldn't have to know about map implementation details to fix this issue. Name mangling would solve the original situation as well as the example given by @elharo (which has no namespacing workaround).

googleberg avatar Aug 15 '22 18:08 googleberg

the example given by @elharo (which has no namespacing workaround).

@googleberg, were you referring to this example? https://github.com/protocolbuffers/protobuf/issues/9520#issuecomment-1175080496

While that is a little annoying, at least the error message is comprehensible. I think the more egregious issue with the original example is that the error message is confusing and misleading (incorrect even).

Name mangling, like as is done with synthetic oneofs for proto3 optional, would indeed help. However, it isn't perfect since it would still be possible to name a user-defined type the same thing as the mangled name (even if highly unlikely). A more robust fix would be to alter the symbol resolution logic (when resolving relative references) to ignore synthetic map entry messages. A possible solution might be to use a mangled name with characters not otherwise allowed in an identifier. But this will mostly likely cause issues with runtimes that validate descriptors during initialization: when the embedded descriptors are initialized (such as to create "rich" descriptors around embedded descriptor proto data), they could choke on a name that appears to be invalid.

jhump avatar Aug 15 '22 18:08 jhump

I have encountered the same error. As a workaround, change MappingEntry to something like MappingE etc.

profcoconut avatar Oct 16 '23 12:10 profcoconut

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Jan 29 '24 10:01 github-actions[bot]

Unless someone has fixed this, it should remain active. legitimate bug reports shouldn't be closed because the team is overloaded.

elharo avatar Jan 29 '24 11:01 elharo

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Apr 30 '24 10:04 github-actions[bot]

This is still an issue

ergl avatar Apr 30 '24 10:04 ergl

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Jul 30 '24 10:07 github-actions[bot]

This is still an issue

ergl avatar Aug 12 '24 08:08 ergl

I've sent a documentation update out for review. I don't think that it will fully resolve the issue, but at least the behavior will be documented.

Logofile avatar Aug 12 '24 14:08 Logofile