Implementation of `has` macro is incomplete
The has macro is implemented by visiting the child nodes and returning true if there was not a CELEvalError. However, this makes it impossible for Protobuf messages to be handled properly.
The docstring describes the algorithm:
If e evaluates to a protocol buffers version 2 message and f is a defined field:
If f is a repeated field or map field, has(e.f) indicates whether the field is non-empty.
If f is a singular or oneof field, has(e.f) indicates whether the field is set.
If e evaluates to a protocol buffers version 3 message and f is a defined field:
If f is a repeated field or map field, has(e.f) indicates whether the field is non-empty.
If f is a oneof or singular message field, has(e.f) indicates whether the field is set.
If f is some other singular field, has(e.f) indicates whether the field's value is its default value (zero for numeric fields, false for booleans, empty for strings and bytes).
However, as far as I can tell, this is impossible to implement correctly no matter what. If you have a MessageType populated by omitting fields that are not present, you get incorrect behavior since unset fields are null instead of their default values. If you have a MessageType populated by setting default values, you get incorrect behavior because has returns true for fields that it shouldn't. What is needed is for has to return false while still populating the default value.
I don't currently have a minimal reproduction, in part because the library I am currently working on does not actually use the cel-python MessageType, but instead uses its own (though I did try to use cel-python MessageType and ran into basically the same issue, so I am pretty sure, as long as I am not missing anything, that it is similarly applicable.) That said, I hope this explanation is enough to either point me in the right direction: it seems like I am either misunderstanding how to use cel-python, or it is simply missing support for this. I think that there needs to be a way for has to be specialized for a given type; idiomatically it'd be best, in my opinion, if has could somehow call __contains__; then Python types, including MessageType, could just implement __contains__ and __getitem__ to do the right thing.
I would be willing to try to submit a PR to rectify this in the future if this sounds like a real issue and my solution would be OK, but I don't currently have a good enough understanding of how lark-parser works, so it may not be something I have time to approach for the moment.
Still very interested in this!
I'm still trying to understand the ask and the implications. Specifically, I don't know if What is needed is for has to return false while still populating the default value. is actually compatible with the CEL spec.
I'm unclear if this reflects a CEL feature or a ProtoBuf feature for unset fields.
Well, I'm not an expert spec reader so I could be misunderstanding the spec, but out of all of the different CEL interpreters we are using (cel-cpp, cel-python, projectnessie/cel-java and cel-go) so far cel-python is the only one behaving differently here, tested via our fairly comprehensive conformance tests (not for CEL, but the protovalidate library.) The cel-spec bit that's relevant to the has macro is quoted partially above. Here is the full thing again though:
To test for the presence of a field, the boolean-valued macro
has(e.f)can be used.
- If
eevaluates to a map, thenhas(e.f)indicates whether the stringfis a key in the map (note thatfmust syntactically be an identifier).- If
eevaluates to a message andfis not a declared field for the message,has(e.f)raises ano_such_fielderror.- If
eevaluates to a protocol buffers version 2 message andfis a defined field:
- If
fis a repeated field or map field,has(e.f)indicates whether the field is non-empty.- If
fis a singular or oneof field,has(e.f)indicates whether the field is set.- If
eevaluates to a protocol buffers version 3 message andfis a defined field:
- If
fis a repeated field or map field,has(e.f)indicates whether the field is non-empty.- If
fis a oneof or singular message field,has(e.f)indicates whether the field is set.- If
fis some other singular field,has(e.f)indicates whether the field's value is its default value (zero for numeric fields,falsefor booleans, empty for strings and bytes).- In all other cases,
has(e.f)evaluates to an error.
IMO, this is sufficient to determine that the current behavior is wrong because has(e.f) for proto3 fields only returns true for fields that are not the default value. That implies that e.f can be e.g. 0 while has(e.f) should return false. As has is implemented in cel-python right now, it doesn't seem like this is possible: if the expression yields a valid value, it will simply return true since no error occurs.
Note that the problem is not just with proto3, either. Proto2 field accessors will return the default value when unset. See this note:
When a message is parsed, if the encoded message bytes do not contain a particular field, accessing that field in the parsed object returns the default value for that field. The default values are type-specific:
- For strings, the default value is the empty string.
- For bytes, the default value is empty bytes.
- For bools, the default value is false.
- For numeric types, the default value is zero.
- For message fields, the field is not set. Its exact value is language-dependent. See the generated code guide for your language for details.
- For enums, the default value is the first defined enum value, which should be 0 (recommended for compatibility with open enums). See Enum Default Value.
The default value for repeated fields is empty (generally an empty list in the appropriate language).
The default value for map fields is empty (generally an empty map in the appropriate language).
The only difference is that in proto2 all fields also have explicit presence, too, so it's possible (IIRC) to have a proto2 field that is present and equal to the default value, therefore has(e.f) would return true. (In some of the protobuf APIs, there is a helper to check if a field has explicit presence, which is helpful because as of editions there is yet another way in which protobuf fields can have explicit field presence, and these fields should be treated roughly inline with proto3 optionals. I don't think python protobuf has it, though.)
I don't know if I can make a much more convincing argument here; the way I read the spec it seems obvious that you have to special case protobuf messages for has. However, I fully understand if you are not convinced. If you would like a more definitive answer I can try to get more clarity from the CEL developers.
P.S.: The reason I mention a custom MessageType is because we are (were?) using a custom MessageType for a different unrelated hack. We no longer need to do that, but the important thing is that merely modifying MessageType itself can not fix this problem; the has macro implementation needs to do something different to make the behavior inline with other CEL interpreters.
P.P.S.: Just to be clear, at the moment we have a workaround for this issue by monkey-patching, but it's pretty terrible and I'd really like to remove it ASAP. It is implemented here and here, if you are curious. Passes our own tests (inline with other runtimes.) Sorry for resorting to this, but I could not figure out a better workaround :\
it seems obvious that you have to special case protobuf messages for has.
Okay, I'm with you so far. I understand CEL-Python is different. I'm totally convinced it's different and therefore wrong. No more convincing is required.
What is the special case? What new behavior is required? What should the distinct behavior be?
Can you provide a concrete example of a CEL expression and some protobuf message binding that shows what happens now and what you need to happen to comply with the specification?
(I'm not sure I understand this well enough to write a test case.)