p4c Set type for LAnd and LOr to be Type

I ran into issues where I created a IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false)) and the resulting type of the IR::LAnd was IR::Type_Unknown. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.

Correct me if I am wrong but I believe the default type for Logical AND and OR should be Type_Boolean. Do this only when the type is unknown.

Apr 12 '24 16:04 fruffy

Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple Type_Boolean or Type_Unknown?

Apr 13 '24 21:04 asl

Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple Type_Boolean or Type_Unknown?

Well it can simply happen when you do not use IR::Type_Boolean::get() but new IR::Type_Boolean(). We could make the allocator operators for these classes private.

To be honest, I am not 100% sure whether this is really the reason for the mismatch. The problem I ran into was two different IR::Type_Boolean pointers which cause the inferred type to be IR::Type_Unknown. It could have also been a clone.

Apr 13 '24 21:04 fruffy

new IR::Type_Boolean(). We could make the allocator operators for these classes private.

I think so. I might be missing something, but I do not see a point of having multiple instances of the same semantic type. Maybe I am wrong :) But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

Apr 13 '24 22:04 asl

But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for Operation_Relation (https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.

Apr 13 '24 22:04 fruffy

LAnd,LOr may have simply be an oversight.

Yes. For "underlying issue" I mean allowing different Type_Boolean instances, etc.

Apr 13 '24 22:04 asl

But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for Operation_Relation (https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.

I agree.

LAnd,LOr may have simply be an oversight.

Yes. For "underlying issue" I mean allowing different Type_Boolean instances, etc.

While I agree that it does not make much sense to have multiple instances of Type_Boolean (or Type_Unknown) I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

Apr 15 '24 07:04 vlstill

I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

Well, it depends on what is the underlying rule of comparing Type's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.

On the other hand, booleans, integere types, etc. definitely should be uniqued.

Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.

Apr 15 '24 07:04 asl

I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

Well, it depends on what is the underlying rule of comparing Type's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.

On the other hand, booleans, integere types, etc. definitely should be uniqued.

Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.

Is there a case for keeping types defined using the type keyword unique? In particular if you're using pointer comparison to determine equality.

Something like:

type bool special_bool_t;

Shouldn't use the same object for special_bool_t as bool. (Although all instances of special_bool_t could use a unique type instance, which is distinct from Type_Boolean - the same class but a different object.)

Apr 15 '24 08:04 ajwalton

In particular if you're using pointer comparison to determine equality.

That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is special_bool_t could be converted to bool implicitly, etc.

Apr 15 '24 17:04 asl

In particular if you're using pointer comparison to determine equality.

That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is special_bool_t could be converted to bool implicitly, etc.

The language spec is quite clear about the semantics here:

While similar to typedef, the type keyword introduces a new type which is not a synonym with the original type: values of the original type and the newly introduced type cannot be mixed in expressions.

Currently the types that can be created by the type keyword are restricted to one of: bit<>, int<>, bool, or types defined using type from such types.

(https://staging.p4.org/p4-spec/docs/P4-16-v1.2.4.html#sec-newtype)

So, anything that would allow mixing of these type would violate the spec.

Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.

Apr 15 '24 19:04 ajwalton

The language spec is quite clear about the semantics here:

Thanks for the clarification! Then yes, there might be multiple "instances" of Type_Boolean, etc. and it is expected that they are different. I think PR is correct then, it's just the difference between new IR::Type_Boolean() and IR::Type_Boolean()::get is confusing and error-prone.

However, it seems that type bool special_bool_t is represented by IR::Type_NewType wrapper node (https://github.com/p4lang/p4c/blob/9ac6969195b41c04ddf1d0d0b123620eab996692/ir/type.def#L717C7-L717C19), so then the original question remains: is there a case for second instance of Type_Boolean, etc.? It looks like not to me. But I might be missing something.

Apr 15 '24 19:04 asl

So, what is to be done here?

So, anything that would allow mixing of these type would violate the spec.

Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.

Internally, the compiler already uses Type_Boolean::get() for all instances of Type_Boolean iirc. So this condition is currently violated if the implementation were based on pointer comparison. But I believe, as @asl points out, that these declarations are wrapped in a NewType.

Apr 18 '24 16:04 fruffy

@asl @vlstill Mind giving this one a review? It's been stuck for a month.

May 15 '24 16:05 fruffy

This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.

The original design of the IR was to have all IR::Type objects be pointer-comparable -- there should only be one instance of any give type. That's why there are get functions to cache the objects. However, that probably was not followed for more complex types like tuple and list.

May 15 '24 23:05 ChrisDodd

This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.

True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how Operation_Relation` is initialized.

Probably the best way to enforce this is to make the new operator private for any type that supports ::get().

May 16 '24 02:05 fruffy

True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how Operation_Relation is initialized.

I'm a little hesitant to endorse changes that just serve to obscure bugs elsewhere in the compiler code. Relation needs an override to set the type to bool regardless of the type of the operands, since that is different from either operand. I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.

Probably the best way to enforce this is to make the new operator private for any type that supports ::get().

Yes, this seems the way to go to avoid future bugs in the code.

May 16 '24 02:05 ChrisDodd

Yes, this seems the way to go to avoid future bugs in the code.

+1. All unique nodes should have closed operator new.

May 16 '24 04:05 asl

I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.

Yes, the reason I am advocating for this is that relying on pointer comparison to ensure the right type propagates is fragile. In practice, to work around the possibility that you may get a different pointer you will end up having to specifiy IR::Type_Boolean::get() for all instances of these operators anyway.

There is also the clone operator, which throws a wrench into all of this.

I think both things should be done, this PR and the privatization of the new operator for any IR class that uses get(). But the second part might take a little longer because new is so pervasively used in the compiler right now.

May 17 '24 21:05 fruffy

There is also the clone operator, which throws a wrench into all of this.

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

Another big problem in my option is the source info. Since that is saved in the node, either these singleton-aspiring types can't have source info (it would be a big breaking change to remove IHasSourceInfo from IR::Type), or they can't be singletons.

So in the end, while the aspiration is for these to be singletons, I don't really see how they can be without rewriting large parts of the compiler (and downstream projects). I suggest we compare them semantically and make all IR::Type_Boolean equal (I believe we already ignore src info in comparison).

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

May 20 '24 14:05 vlstill

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

The problem is that clone returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

On the other hand since we manage the allocation through get we can easily disable or enable caching now.

May 20 '24 15:05 fruffy

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

The problem is that clone returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.

Yes, that is what I meant by "it does not work type-wise". I should be explicit if talking about P4 types or C++ types in implementation of the compiler :-).

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

On the other hand since we manage the allocation through get we can easily disable or enable caching now.

True. This one would be reasonably easy to disable. All the more reason to not count on values being pointer comparable though.

May 21 '24 06:05 vlstill

I ran into issues where I created a IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false)) and the resulting type of the IR::LAnd was IR::Type_Unknown. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.

I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.

Personally I found it quite tedious that some things rely on types in the IR and some rely on TypeMap so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.

May 21 '24 06:05 vlstill

I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.

Personally I found it quite tedious that some things rely on types in the IR and some rely on TypeMap so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.

The original design was that the first run of the TypeChecker would insert casts as needed to ensure that everything was correctly typed. Passes that run after should be type correct, which is why subsequent runs of the typechecker run with readOnly = true -- any place then found without correct types should be a BUG.

The type map should not be needed, except to temporarily hold types in the middle of typechecking (which is simplest as a two-step process, first inferring the types of everything in an Inspector, followed by a Transform that rewrites things to have the correct type in the nodes -- this second pass should get rid of Type_Name nodes.) The problem is that this is diffuclt to dump as valid P4 after this transform (due to things being used before they're defined in the resulting dump).

All the relations (Equ, Neq, Lss, ...) should default to bool, same as LOr/LAnd.

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

May 21 '24 06:05 ChrisDodd

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

Oh right, it is in Operation_Relation, sorry I did not notice it there. Then I agree this should be unconditional.

May 21 '24 07:05 vlstill

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

Oh right, it is in Operation_Relation, sorry I did not notice it there. Then I agree this should be unconditional.

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

If we want to fix these nodes, we could just remove the constructor at this point.

May 21 '24 13:05 fruffy

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor. In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like new IR::Equ(explicit_type, left, right) will create a node with type = explicit_type, not bool.)

May 21 '24 21:05 ChrisDodd

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor.

I think there are a couple workarounds, either writing a pass that performs type substitution or just modifying the type after, but they seem cumbersome. We can have a restricted interface first and then extend it when required?

In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like new IR::Equ(explicit_type, left, right) will create a node with type = explicit_type, not bool.)

When I was looking at the Operation_Relation constructor after it was generated it looked like this:

IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
    validate(); 
}

So the type is always overridden.

May 21 '24 21:05 fruffy

When I was looking at the Operation_Relation constructor after it was generated it looked like this:
IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
    validate(); 
}
So the type is always overridden.

In this case the parameter type shadows the field type, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is no type parameter to shadow it.

This is more of a "happy accident" than anything deliberate in the design of the ir-generator, so might be considered and obfuscation by some, but to the extent that it just "does the right thing" does not require detailed understanding of the details of the ir-generator.

May 21 '24 22:05 ChrisDodd

In this case the parameter type shadows the field type, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is no type parameter to shadow it.

I see, that is something I didn't consider.

May 21 '24 22:05 fruffy

OK, so in the end the conditional (as written now by Fabian) and unconditional setting work basically the same, unless someone wants to explicitly set type explicitly to unknown (which would be very weird). We should probably at least add a comment to the ctor that it sets the value only for the constructor that does not take it.

May 22 '24 06:05 vlstill

p4c
p4c copied to clipboard

Set type for LAnd and LOr to be Type_Boolean if unknown.

p4c p4c copied to clipboard

Set type for LAnd and LOr to be Type_Boolean if unknown.

p4c
p4c copied to clipboard