clangir
clangir copied to clipboard
Assertion failure on switch statement with case label in nested block
ClangIR hits an assertion failure when a switch statement contains a case label that is within a nested block statement.
int g(int x) {
switch (x) {
case 1:
return -1;
{
case 2:
return -20;
}
}
return x;
}
clang++: .../clang/lib/CIR/CodeGen/CIRGenStmt.cpp:300:
mlir::LogicalResult cir::CIRGenFunction::buildSimpleStmt(const clang::Stmt*, bool):
Assertion `0 && "Should not get here, currently handled directly from SwitchStmt"' failed.
While code like this is unlikely to appear in production and will usually be found in test suites that try to break the compiler, it is legal code in both C and C++ and should not trigger an internal compiler error.
While code like this is unlikely to appear in production
Famous "last words" haha, I just hit this in prod! Some "temporary" workaround is using this.
Now that @gitoleg work has landed we probably have the machinery to solve this. Any chance (or any plans) you might work on this soon @gitoleg? (cc. @wenpen)
I will take a look in the next few days, sure
Now that @gitoleg work has landed we probably have the machinery to solve this.
This is all about codegen, so I would say the previous work has nothing to do with it.
Speaking about the current issue, it's still not obvious for me how to handle it properly, though I think I can suggest an approach.
First of all, I'm not sure we do the proper thing when we attach random statements to the previous case
(or I miss something important here):
for (auto *c : compoundStmt->body()) {
if (auto *switchCase = dyn_cast<SwitchCase>(c)) {
res = buildSwitchCase(*switchCase, condType, caseAttrs);
} else if (lastCaseBlock) { // !!! This clause
// This means it's a random stmt following up a case, just
// emit it as part of previous known case.
mlir::OpBuilder::InsertionGuard guardCase(builder);
builder.setInsertionPointToEnd(lastCaseBlock);
res = buildStmt(c, /*useCurrentScope=*/!isa<CompoundStmt>(c));
} else {
llvm_unreachable("statement doesn't belong to any case region, NYI");
}
lastCaseBlock = builder.getBlock();
if (res.failed())
break;
}
The original codegen even doesn't emit any random code as far I can see, and I think we need to do the same. We could extend the current issue example (will refer it as example 1) to make it a little more interesting (example 2):
int g(int x) {
int y = 1;
switch (x) {
case 1:
return -1;
{
y = 42;
case 2:
return y;
}
}
return x;
}
The y = 42
statement is not taken in account and g(2)
returns exactly 1
. So I would say, we need to ignore this statement as well or at least place it in some unreachable scope.
Now, what I would do. I would create a small class for the switch stmt processing with several fields like case attributes, maybe switch variable type - and maintain a pointer to it in the CIRGenFunction.h
. Once we enter buildSwitchStmt
in the CIR codegen, we would save the current pointer in the temp var, create a new instance of this class, store the pointer into the member field. Once we leave the buildSwitchStmt
we restore the pointer. Almost the same is done in the original codegen.
So we will able to handle nested switch statements and we can fix the example 1 . The second example is less trivial ( and one need to take it into account when will fix the issue, even if it will be me :) ) . But I believe it's doable as well.
@bcardosolopes what do you think? Is it a good/workable solution? Or I miss some details here?
This is all about codegen, so I would say the previous work has nothing to do with it.
Right, I wasn't implying that, sorry for the misunderstanding. My initial thinking is that we could probably map these weird inner cases with synthetic gotos/labels, since they violate some scope control-flow.
Speaking about the current issue, it's still not obvious for me how to handle it properly, though I think I can suggest an approach.
Thanks for looking into this.
First of all, I'm not sure we do the proper thing when we attach random statements to the previous case (or I miss something important here): ... The original codegen even doesn't emit any random code as far I can see, and I think we need to do the same. We could extend the current issue example (will refer it as example 1) to make it a little more interesting (example 2): .. The y = 42 statement is not taken in account and g(2) returns exactly 1. So I would say, we need to ignore this statement as well or at least place it in some unreachable scope.
If the original codegen is doing that (I haven't checked), I agree we should do the same - It'd probably be nice to maintain the statement for the sake of providing unrecheable diagnostics later on, e.g. even though we ignore the emission, we could emit an empy scope with source loc on a basic block that is unrecheable. But just initially supporting lowering works for me. I also don't see any current warnings for these types of unrecheable statements in clang.
Now, what I would do. I would create a small class for the switch stmt processing with several fields like case attributes, maybe switch variable type - and maintain a pointer to it in the CIRGenFunction.h. Once we enter buildSwitchStmt in the CIR codegen, we would save the current pointer in the temp var, create a new instance of this class, store the pointer into the member field. Once we leave the buildSwitchStmt we restore the pointer. Almost the same is done in the original codegen.
Yep, we currently track some of it already as part of LexicalScopes
, we could increment it with more needed info.
So we will able to handle nested switch statements and we can fix the example 1. The second example is less trivial ( and one need to take it into account when will fix the issue, even if it will be me :) ) . But I believe it's doable as well. @bcardosolopes what do you think? Is it a good/workable solution? Or I miss some details here?
Sounds great to me, thanks for sharing a solution. Do you have any interest to work on this?
@bcardosolopes
Sounds great to me, thanks for sharing a solution. Do you have any interest to work on this?
I do) But I can't promise I'll do it soon. So if someone wants to do it earlier, just let me know, e.g. right here. Otherwise I'll return to this issue later and keep you informed about the progress/problems etc.
@wenpen do you want to work on this one?
@piggynl this could be a good issue to tackle as well.
I'm still working on https://github.com/llvm/clangir/pull/528, and I feel these problems should be resolved in a unified solution. So I need to do some more test, will update my proposal soon when I figure out it.
I have a proposal to fix this issue by modifying the definition of SwitchOp
What's the limitation of current SwitchOp
?
Current SwitchOp
have a list of caseAttr
, one caseAttr
is corresponding to one region.
But case
could be used very flexibly just like goto
. consider the following code
switch (x) {
while (y) {
if (z) {
case 1:
...
} else {
case 2:
...
}
}
}
The case
is contained by WhileOp
, SwitchOp
should have only one block containing WhileOp
, and we can't split it into two regions.
How does the original clang codegen handle switch
?
CodeGenFunction
class have a pointer member llvm::SwitchInst *SwitchInsn
to maintain the current switch information. SwitchInsn
will record the case value and destination block, which is similar to SwitchFlatOp
. When emit SwitchStmt, it will update the pointer SwitchInsn
, and recursively process the body statement by calling EmitStmt()
. When emit CaseStmt, it will call SwitchInsn->addCase(...)
to store the case value and destination block.
Proposal for the new definition of SwitchOp
As the behavior of switch
/case
is similar to goto
/label
, we could refer to the design of goto
.
Specifically, add a new op CaseOp
to store the case value, then SwitchOp
have exactly one region. And we won't handle CaseStmt
in buildSwitchStmt
, instead we would do the work in buildSimpleStmt
.
The above example code would generate cir
cir.scope {
cir.switch (%1) {
cir.scope {
cir.while {
cir.condition(%2)
} do {
cir.scope {
cir.if %3 {
cir.case 1
...
} else {
cir.case 2
...
}
}
}
}
}
}
In FlattenCFG
, we could replace SwitchOp
with SwitchFlatOp
simply. Just walk in the SwitchOp
, collect the case value and (destination) blocks when we find out CaseOp
, then we could remove SwitchOp
& CaseOp
and build SwitchFlatOp
.
Besides, it could resolve some other switch related issues, including unreachable code(#520 #521) and incorrect implementation of getOrCreateRetBlock
(talked about in #528).
@wenpen thanks for taking the time to propose a holistic solution, very nice writeup.
I think it overall makes sense, the part that bothers me is having a design that prioritize the corner case of the language in detriment of the common use cases, but I think this is hinting at the right direction.
Given your proposal, few questions:
- How would a "normal" switch use case be represented by this, can you paste few examples?
- Have you thought about some mixed approach? Examples:
- As the CaseAttr becomes a CaseOp, perhaps cir.case could optionally have regions to maintain the old behavior, so that cases with underlying scopes could be directly represented. Because new cir.switch will only have one region, this means a "normal" switch use case would be just a sequential number of cir.cases.
- What's your plan for handling fallthroughs? For
break
s, we'll likely need a region within acir.case
to represent it, or do you have alternative ideas? - If you look at cir.try_call, it might or might not be within a direct cir.try, doing similar for cir.case/cir.switch makes sense too.
Thanks!
How would a "normal" switch use case be represented by this, can you paste few examples?
Here is an example code to discuss the mixed (common and corner) case.
switch(x) {
statement0;
case 1:
case 2:
statement1;
{
case 3
statement2;
}
break;
case 4:
statement3;
case 5:
statement4;
break;
default:
statement5;
}
cir.scope {
cir.switch (%1) {
statement0
cir.case (anyof, [1, 2])
statement1
cir.scope {
cir.case (equal 3)
statement2
cir.yield
}
cir.break
cir.case (equal 4)
statement3
cir.case (equal 5)
statement4
cir.break
cir.case (default)
statement5
cir.yield
}
}
As the CaseAttr becomes a CaseOp, perhaps cir.case could optionally have regions to maintain the old behavior, so that cases with underlying scopes could be directly represented. Because new cir.switch will only have one region, this means a "normal" switch use case would be just a sequential number of cir.cases.
Good point! I feel it's a better structure to make cir.case have a region for "normal" use case. So, I'd like to keep the current logic to scan CaseStmt
in switch compound body. And the other (who nested in other op) CaseStmt
won't have any region.
The cir would be as below, and it will be easy to change cir.case to have exactly one region once we need.
cir.scope {
cir.switch (%1) {
statement0
cir.case (anyof, [1, 2]) {
statement1
cir.scope {
cir.case (equal 3)
statement2
cir.yield
}
cir.break
}
cir.case (4) {
statement3
cir.yield
}
cir.case (5) {
statement4
cir.break
}
cir.case (default) {
statement5
cir.yield
}
cir.yield
}
}
What's your plan for handling fallthroughs? For
break
s, we'll likely need a region within acir.case
to represent it, or do you have alternative ideas?
My thought is same with the current implement: use cir.break
and cir.yield
for exit and fallthroughs, then replace them with cir.br
in FlattenCFG.cpp
. The above code show the usage.
If you look at cir.try_call, it might or might not be within a direct cir.try, doing similar for cir.case/cir.switch makes sense too.
Yes, it looks similar. But when I tried some code to see the cir, it crashed. Have created issue 688 for it. Will have another try later.
Thanks for your suggestions~ @bcardosolopes
@gitoleg @wenpen hi, are you still working on this? I'd like to work on it if you haven't started it yet.
I'm not, sorry
@ChuanqiXu9 Sorry my work is hold on, happy to see you take it~
As the CaseAttr becomes a CaseOp, perhaps cir.case could optionally have regions to maintain the old behavior, so that cases with underlying scopes could be directly represented. Because new cir.switch will only have one region, this means a "normal" switch use case would be just a sequential number of cir.cases.
Good point! I feel it's a better structure to make cir.case have a region for "normal" use case. So, I'd like to keep the current logic to scan CaseStmt in switch compound body. And the other (who nested in other op) CaseStmt won't have any region. The cir would be as below, and it will be easy to change cir.case to have exactly one region once we need.
Out of curiosity, what's the benefit to make a region for a case? I feel it is less straight forward. Can we make it simpler to treat all cases like labels?
@bcardosolopes @wenpen