p4c
p4c copied to clipboard
Make P4Tools' reachability analysis more general and reuse existing compiler code
Emerged from the discussion in #3569.
Could you, please, specify what you mean by "reuse existing compiler code". If we don't make a copy of the object before calling the apply
function then we can't use ControlFlowVisitor
as mentioned by @mbudiu-vmw, because of the seg. faults. Do, we really need to use apply
function instead of the visit
?
@mbudiu-vmw mentioned that there are a few existing facilities that perform similar analysis in p4c. We can take a closer look at how to improve the existing pass before migrating it to the mid end. I do not quite know yet, whether we can reuse existing code. This issue is intended to continue the discussion from the merged PR and to track this issue.
In #3569 @mbudiu-vmw mentioned a CFG data structure, ControlFlowVisitor, the def-use and graphs backend. I'll start with them.
Let's start with the description of the task which was solved by the reachability feature. For each pair (s, e)
of IR::Node
's it should return true if e
is reachable from s
by control flow. Otherwise, It should return false
. Considered nodes should contain IR::Anontation name which could be used from the user's input to get the corresponding IR::Node which represents it or the parent node which contains it.
Let's consider different possibilities for the implementation of this feature.
-
graphs backend. It is placed in
backends/graphs
folder. It was implemented based on theboost::subgraph
class. It doesn't store all nodes in that subgraph. For example, Here it works withIR::Assigment
statement. It stored it here for the corresponding vertex. In this case, all assignments will be out ofboost::subgraph
searching procedure and it'll hard to find the corresponding subgraph for such assignments. Analysis result: these classes can't be used for reachability feature implementation, but in general we can consider changing the main class for implementation from CallGraph to theboost::subgraph
. However, I think that in that case, the class P4ProgramDCGCreator will be almost the same. -
ControlFlowVisitor. The P4ProgramDCGCreator doesn't use copy object before applying to the arguments of the nodes. It uses the
visit
function because the visitor crashed in the opposite case. Analysis result: The P4ProgramDCGCreator can use ControlFlowVisitor is it make a copy of an object before applying to the arguments for joins detection. -
def-use and CFG in a process.
Can you define more precisely what "reachable" means? Are you talking about statements? Expressions? There are many kinds of IR nodes. If you consider procedures or actions, do you want an inter-procedural or intra-procedural analysis? If the former, do you want it context-sensitive or context-insensitive? Our visitors are not really designed for inter-procedural analyses, but they can be used by creating new visitors when you visit a procedure, we have many instances, e.g., in def-use.
I don't understand the problem about the crash. The Visitors are 100% safe if used correctly.
The original goal of this reachability analysis is to build a map for each statement in the program which contains a list of accessible downstream nodes (for now, annotations, tables, parser states. I believe this list can be expanded). The analysis should be both inter-procedural and context-sensitive.
Concretely, the long-term goal here is to allow tools such as P4Testgen to prune generated tests based on input queries. For example, one may only want to generate tests that hit the IPv4 parser state. All other tests should be eliminated as early as possible. The reachability analysis is intended to be a building block of this feature. Its design must reflect that. It might be the case that these requirements are too specific and there is not enough overlap with other tools in p4c.
I don't understand the problem about the crash. The Visitors are 100% safe if used correctly.
If I understand this correctly, using apply
within a preorder/postorder function using the same visitor leads to a crash. The only way to solve this is to create a new visitor, copy some of the state, then apply. I am not sure whether this is the case for the ControlFlowVisitor. @VolodymyrPeschanenkoLitSoft can clarify.
There is a revisit
method if you want to revisit a node, but I have never used it myself.
The visitors check on purpose that the IR graph has no cycles.
@ChrisDodd can explain how they work.
I don't understand the problem about the crash. The Visitors are 100% safe if used correctly.
Here is a simple example:
class NewInspector : public Inspector {
public:
NewInspector() {};
bool preorder(const IR::P4Control* control) override {
control->body->apply(*this);
return false;
}
};
...
NewInspector insp;
program->apply(insp);
Here program
is a IR::P4Program
. I tested it on this example and it crashed here. The visited
was a nullptr
.
- def-use can't be used for reachability feature, because it doesn't store information about transitions.
-
CFG. It is in
backends/bmv2/common
folder. It doesn't supportIR::P4Parser, IR::ParserState, IR::Annotation
. In general, we can use it for the implementation of the reachability feature, but it should be expanded by the additionalmakeNode
functions here and by new functions to the Inspector here. It doesn't have a function which can define a place inCFG
fromIR::Node
.
As a result, I can say, that CallGraph
was implemented with help of oredered_maps
here. It is easy to get some state inside CFG by IR::Node
and the main data structure for implementing of the reachability feature was selected correctly.
The CFGBuilder and P4ProgramDCGCreator classes use visited
function for sub-path traversal which makes unusable the ControlFlowVisitor class for the reachability feature because its function filter_join_point will not be called.
@fruffy, @mbudiu-vmw What do you think?
Yes, you should not reapply the same visitor to a node that is already being visited.