p4c Make P4Tools' reachability analysis more general and reuse existing compiler code

Emerged from the discussion in #3569.

Oct 17 '22 15:10 fruffy

Could you, please, specify what you mean by "reuse existing compiler code". If we don't make a copy of the object before calling the apply function then we can't use ControlFlowVisitor as mentioned by @mbudiu-vmw, because of the seg. faults. Do, we really need to use apply function instead of the visit?

Oct 17 '22 15:10 VolodymyrPeschanenkoLitSoft

@mbudiu-vmw mentioned that there are a few existing facilities that perform similar analysis in p4c. We can take a closer look at how to improve the existing pass before migrating it to the mid end. I do not quite know yet, whether we can reuse existing code. This issue is intended to continue the discussion from the merged PR and to track this issue.

Oct 17 '22 15:10 fruffy

In #3569 @mbudiu-vmw mentioned a CFG data structure, ControlFlowVisitor, the def-use and graphs backend. I'll start with them.

Oct 17 '22 15:10 VolodymyrPeschanenkoLitSoft

Let's start with the description of the task which was solved by the reachability feature. For each pair (s, e) of IR::Node's it should return true if e is reachable from s by control flow. Otherwise, It should return false. Considered nodes should contain IR::Anontation name which could be used from the user's input to get the corresponding IR::Node which represents it or the parent node which contains it. Let's consider different possibilities for the implementation of this feature.

graphs backend. It is placed in backends/graphs folder. It was implemented based on theboost::subgraph class. It doesn't store all nodes in that subgraph. For example, Here it works with IR::Assigment statement. It stored it here for the corresponding vertex. In this case, all assignments will be out of boost::subgraph searching procedure and it'll hard to find the corresponding subgraph for such assignments. Analysis result: these classes can't be used for reachability feature implementation, but in general we can consider changing the main class for implementation from CallGraph to the boost::subgraph. However, I think that in that case, the class P4ProgramDCGCreator will be almost the same.
ControlFlowVisitor. The P4ProgramDCGCreator doesn't use copy object before applying to the arguments of the nodes. It uses the visit function because the visitor crashed in the opposite case. Analysis result: The P4ProgramDCGCreator can use ControlFlowVisitor is it make a copy of an object before applying to the arguments for joins detection.
def-use and CFG in a process.

Oct 18 '22 14:10 VolodymyrPeschanenkoLitSoft

Can you define more precisely what "reachable" means? Are you talking about statements? Expressions? There are many kinds of IR nodes. If you consider procedures or actions, do you want an inter-procedural or intra-procedural analysis? If the former, do you want it context-sensitive or context-insensitive? Our visitors are not really designed for inter-procedural analyses, but they can be used by creating new visitors when you visit a procedure, we have many instances, e.g., in def-use.

I don't understand the problem about the crash. The Visitors are 100% safe if used correctly.

Oct 18 '22 16:10 mihaibudiu

The original goal of this reachability analysis is to build a map for each statement in the program which contains a list of accessible downstream nodes (for now, annotations, tables, parser states. I believe this list can be expanded). The analysis should be both inter-procedural and context-sensitive.

Concretely, the long-term goal here is to allow tools such as P4Testgen to prune generated tests based on input queries. For example, one may only want to generate tests that hit the IPv4 parser state. All other tests should be eliminated as early as possible. The reachability analysis is intended to be a building block of this feature. Its design must reflect that. It might be the case that these requirements are too specific and there is not enough overlap with other tools in p4c.

I don't understand the problem about the crash. The Visitors are 100% safe if used correctly.

If I understand this correctly, using apply within a preorder/postorder function using the same visitor leads to a crash. The only way to solve this is to create a new visitor, copy some of the state, then apply. I am not sure whether this is the case for the ControlFlowVisitor. @VolodymyrPeschanenkoLitSoft can clarify.

Oct 19 '22 01:10 fruffy

There is a revisit method if you want to revisit a node, but I have never used it myself. The visitors check on purpose that the IR graph has no cycles. @ChrisDodd can explain how they work.

Oct 19 '22 01:10 mihaibudiu

I don't understand the problem about the crash. The Visitors are 100% safe if used correctly.

Here is a simple example:

class NewInspector : public Inspector {
  public:
    NewInspector() {};
    bool preorder(const IR::P4Control* control) override {
        control->body->apply(*this);
        return false;
    }
};
...
 NewInspector insp;
 program->apply(insp);

Here program is a IR::P4Program. I tested it on this example and it crashed here. The visited was a nullptr.

Oct 19 '22 13:10 VolodymyrPeschanenkoLitSoft

def-use can't be used for reachability feature, because it doesn't store information about transitions.
CFG. It is in backends/bmv2/common folder. It doesn't support IR::P4Parser, IR::ParserState, IR::Annotation. In general, we can use it for the implementation of the reachability feature, but it should be expanded by the additional makeNode functions here and by new functions to the Inspector here. It doesn't have a function which can define a place in CFG from IR::Node.

As a result, I can say, that CallGraph was implemented with help of oredered_maps here. It is easy to get some state inside CFG by IR::Node and the main data structure for implementing of the reachability feature was selected correctly.

The CFGBuilder and P4ProgramDCGCreator classes use visited function for sub-path traversal which makes unusable the ControlFlowVisitor class for the reachability feature because its function filter_join_point will not be called.

@fruffy, @mbudiu-vmw What do you think?

Oct 19 '22 14:10 VolodymyrPeschanenkoLitSoft

Yes, you should not reapply the same visitor to a node that is already being visited.

Oct 19 '22 17:10 mihaibudiu

p4c p4c copied to clipboard

Make P4Tools' reachability analysis more general and reuse existing compiler code

p4c
p4c copied to clipboard