p4c
p4c copied to clipboard
Set type for LAnd and LOr to be Type_Boolean if unknown.
I ran into issues where I created a IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false))
and the resulting type of the IR::LAnd
was IR::Type_Unknown
. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.
Correct me if I am wrong but I believe the default type for Logical AND and OR should be Type_Boolean
. Do this only when the type is unknown.
Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple Type_Boolean
or Type_Unknown
?
Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple
Type_Boolean
orType_Unknown
?
Well it can simply happen when you do not use IR::Type_Boolean::get()
but new IR::Type_Boolean()
. We could make the allocator operators for these classes private.
To be honest, I am not 100% sure whether this is really the reason for the mismatch. The problem I ran into was two different IR::Type_Boolean
pointers which cause the inferred type to be IR::Type_Unknown. It could have also been a clone.
new IR::Type_Boolean()
. We could make the allocator operators for these classes private.
I think so. I might be missing something, but I do not see a point of having multiple instances of the same semantic type. Maybe I am wrong :) But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.
But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.
So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for Operation_Relation
(https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.
LAnd,LOr may have simply be an oversight.
Yes. For "underlying issue" I mean allowing different Type_Boolean
instances, etc.
But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.
So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for
Operation_Relation
(https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.
I agree.
LAnd,LOr may have simply be an oversight.
Yes. For "underlying issue" I mean allowing different
Type_Boolean
instances, etc.
While I agree that it does not make much sense to have multiple instances of Type_Boolean
(or Type_Unknown
) I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).
I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).
Well, it depends on what is the underlying rule of comparing Type
's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.
On the other hand, booleans, integere types, etc. definitely should be uniqued.
Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.
I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).
Well, it depends on what is the underlying rule of comparing
Type
's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.On the other hand, booleans, integere types, etc. definitely should be uniqued.
Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.
Is there a case for keeping types defined using the type keyword unique? In particular if you're using pointer comparison to determine equality.
Something like:
type bool special_bool_t;
Shouldn't use the same object for special_bool_t
as bool. (Although all instances of special_bool_t
could use a unique type instance, which is distinct from Type_Boolean
- the same class but a different object.)
In particular if you're using pointer comparison to determine equality.
That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is special_bool_t
could be converted to bool
implicitly, etc.
In particular if you're using pointer comparison to determine equality.
That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is
special_bool_t
could be converted tobool
implicitly, etc.
The language spec is quite clear about the semantics here:
While similar to typedef, the type keyword introduces a new type which is not a synonym with the original type: values of the original type and the newly introduced type cannot be mixed in expressions.
Currently the types that can be created by the type keyword are restricted to one of: bit<>, int<>, bool, or types defined using type from such types.
(https://staging.p4.org/p4-spec/docs/P4-16-v1.2.4.html#sec-newtype)
So, anything that would allow mixing of these type would violate the spec.
Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.
The language spec is quite clear about the semantics here:
Thanks for the clarification! Then yes, there might be multiple "instances" of Type_Boolean
, etc. and it is expected that they are different. I think PR is correct then, it's just the difference between new IR::Type_Boolean()
and IR::Type_Boolean()::get
is confusing and error-prone.
However, it seems that type bool special_bool_t
is represented by IR::Type_NewType
wrapper node (https://github.com/p4lang/p4c/blob/9ac6969195b41c04ddf1d0d0b123620eab996692/ir/type.def#L717C7-L717C19), so then the original question remains: is there a case for second instance of Type_Boolean
, etc.? It looks like not to me. But I might be missing something.
So, what is to be done here?
So, anything that would allow mixing of these type would violate the spec.
Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.
Internally, the compiler already uses Type_Boolean::get()
for all instances of Type_Boolean
iirc. So this condition is currently violated if the implementation were based on pointer comparison. But I believe, as @asl points out, that these declarations are wrapped in a NewType
.
@asl @vlstill Mind giving this one a review? It's been stuck for a month.
This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.
The original design of the IR was to have all IR::Type objects be pointer-comparable -- there should only be one instance of any give type. That's why there are get
functions to cache the objects. However, that probably was not followed for more complex types like tuple and list.
This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.
True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how
Operation_Relation` is initialized.
Probably the best way to enforce this is to make the new operator private for any type that supports ::get()
.
True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how
Operation_Relation
is initialized.
I'm a little hesitant to endorse changes that just serve to obscure bugs elsewhere in the compiler code. Relation needs an override to set the type to bool regardless of the type of the operands, since that is different from either operand. I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.
Probably the best way to enforce this is to make the new operator private for any type that supports
::get()
.
Yes, this seems the way to go to avoid future bugs in the code.
Yes, this seems the way to go to avoid future bugs in the code.
+1. All unique nodes should have closed operator new
.
I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.
Yes, the reason I am advocating for this is that relying on pointer comparison to ensure the right type propagates is fragile. In practice, to work around the possibility that you may get a different pointer you will end up having to specifiy IR::Type_Boolean::get() for all instances of these operators anyway.
There is also the clone
operator, which throws a wrench into all of this.
I think both things should be done, this PR and the privatization of the new operator for any IR class that uses get()
. But the second part might take a little longer because new is so pervasively used in the compiler right now.
There is also the
clone
operator, which throws a wrench into all of this.
Well if the types were really singletons (there was no reason to have two instances) that clone
should just return this
I think, but that does not work type-wise.
Another big problem in my option is the source info. Since that is saved in the node, either these singleton-aspiring types can't have source info (it would be a big breaking change to remove IHasSourceInfo
from IR::Type
), or they can't be singletons.
So in the end, while the aspiration is for these to be singletons, I don't really see how they can be without rewriting large parts of the compiler (and downstream projects). I suggest we compare them semantically and make all IR::Type_Boolean
equal (I believe we already ignore src info in comparison).
Also note that by relying more on IR::...::get()
deduplication, we are relying more on static variables, which is not nice :-(.
Well if the types were really singletons (there was no reason to have two instances) that
clone
should just returnthis
I think, but that does not work type-wise.
The problem is that clone
returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.
Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.
On the other hand since we manage the allocation through get
we can easily disable or enable caching now.
Well if the types were really singletons (there was no reason to have two instances) that
clone
should just returnthis
I think, but that does not work type-wise.The problem is that
clone
returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.
Yes, that is what I meant by "it does not work type-wise". I should be explicit if talking about P4 types or C++ types in implementation of the compiler :-).
Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.
On the other hand since we manage the allocation through
get
we can easily disable or enable caching now.
True. This one would be reasonably easy to disable. All the more reason to not count on values being pointer comparable though.
I ran into issues where I created a
IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false))
and the resulting type of theIR::LAnd
wasIR::Type_Unknown
. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.
I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.
Personally I found it quite tedious that some things rely on types in the IR and some rely on TypeMap
so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.
I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.
Personally I found it quite tedious that some things rely on types in the IR and some rely on
TypeMap
so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.
The original design was that the first run of the TypeChecker would insert casts as needed to ensure that everything was correctly typed. Passes that run after should be type correct, which is why subsequent runs of the typechecker run with readOnly = true
-- any place then found without correct types should be a BUG.
The type map should not be needed, except to temporarily hold types in the middle of typechecking (which is simplest as a two-step process, first inferring the types of everything in an Inspector, followed by a Transform that rewrites things to have the correct type in the nodes -- this second pass should get rid of Type_Name nodes.) The problem is that this is diffuclt to dump as valid P4 after this transform (due to things being used before they're defined in the resulting dump).
All the relations (Equ, Neq, Lss, ...) should default to bool, same as LOr/LAnd.
The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if
from this change making the ctor just { type = IR::Boolean::get(); }
The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the
if
from this change making the ctor just{ type = IR::Boolean::get(); }
Oh right, it is in Operation_Relation
, sorry I did not notice it there. Then I agree this should be unconditional.
The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the
if
from this change making the ctor just{ type = IR::Boolean::get(); }
Oh right, it is in Operation_Relation, sorry I did not notice it there. Then I agree this should be unconditional.
I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.
If we want to fix these nodes, we could just remove the constructor at this point.
I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.
It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor. In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like new IR::Equ(explicit_type, left, right)
will create a node with type = explicit_type, not bool.)
I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.
It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor.
I think there are a couple workarounds, either writing a pass that performs type substitution or just modifying the type after, but they seem cumbersome. We can have a restricted interface first and then extend it when required?
In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like
new IR::Equ(explicit_type, left, right)
will create a node with type = explicit_type, not bool.)
When I was looking at the Operation_Relation
constructor after it was generated it looked like this:
IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
validate();
}
So the type is always overridden.
When I was looking at the
Operation_Relation
constructor after it was generated it looked like this:IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right) {{ type = Type::Boolean::get(); } validate(); }
So the type is always overridden.
In this case the parameter type
shadows the field type
, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is no type
parameter to shadow it.
This is more of a "happy accident" than anything deliberate in the design of the ir-generator, so might be considered and obfuscation by some, but to the extent that it just "does the right thing" does not require detailed understanding of the details of the ir-generator.
In this case the parameter
type
shadows the fieldtype
, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is notype
parameter to shadow it.
I see, that is something I didn't consider.
OK, so in the end the conditional (as written now by Fabian) and unconditional setting work basically the same, unless someone wants to explicitly set type explicitly to unknown (which would be very weird). We should probably at least add a comment to the ctor that it sets the value only for the constructor that does not take it.