cppparser icon indicating copy to clipboard operation
cppparser copied to clipboard

How to traverse the ast tree?

Open dandelion915 opened this issue 1 year ago • 11 comments

Say I have two source code file1.cpp and file2.cpp written in C++, I want to know the difference code between them and locate this difference code to the function. I kind of get that this cppparser is able to get ast of the source code, but i'm still confused how to traverse the ast tree and compare two ast trees from different source code? Could anyone help me with this?

dandelion915 avatar Jun 14 '23 07:06 dandelion915

Hi there, I believe you have to write your own code for that. Furthermore, you can use my fork's doxygen branch to build the documentation and check out the inheritance diagram to get a better understanding of the AST's data structure.

inherit_graph_0

Also, check out the examples.

salehjg avatar Jun 15 '23 12:06 salehjg

Thanks for your wonderful reply! The example is very clear how the code is structed. However, do you have any idea on how to destructualize the code? What i'm saying is for a certain line of code within a function, I want to collect the all of the components, e.g, int a = b + c, then i need to collect int,a,=,b,+,c. It confuses me how to use this parser to traverse the data structure. I'd be very appreciated if you can give me any hints, a short example code might be the best :)

dandelion915 avatar Jun 19 '23 09:06 dandelion915

@dandelion915 Sure thing mate.

As I understand it, you want to get familiar with how AST expresses a piece of code. If so, I highly recommend checking out this example file from my fork.

It goes through manually creating for-loops, arrays, functions, pointers, expressions (chained or single), ... . It should provide you with the initial insight into the AST structure needed to write your own code.

So for a basic int a = b+c;, you have a CppVar instance as in below:

auto* myVar = new CppVar(
      new CppVarType("int"),
      CppVarDecl(
             "a", 
              new CppExpr(new CppExprAtom("b"), CppOperator::kPlus, new CppExprAtom("c"));, 
              AssignType::kUsingEqual)
    );

Now, as you can see, b and c are expressed as CppExprAtom instances. All expressions whether they are singular or chained, are expressed using nested CppExpr instances. Variables can have their initial value assignments in CppVar using CppVarDecl. There are much more details about the AST structure that cannot be covered here. Basically, you need a good debugging environment like MS VS or CLion to inspect the AST of a piece of code generated by CppParser and then work your way up from there.

{ .... } is called a block and is represented by a CppCompound of type CppCompoundType::kBlock. Its children are stored in CppCompund::members_. CppCompund::add_member() could be used to add them one by one.

Also, have a look at cppast.

salehjg avatar Jun 19 '23 10:06 salehjg

Thank for your speedy reply. I have seen your wonderful example.But what i'm saying is a reverse process of your example. To be specific, let's say i have an input hello-world.cpp file like this:

#include <iostream>

int add(int a, int b)
{
  int c = a + b;
  return c;
}


int main()
{
  std::cout << "Hello World\n";
  int res = add(1, 2);
  return 0;
}

The purpose is to get the function body with in add() and main() using cppparser:

CppParser parser;
 CppCompoundPtr ast = parser.parseFile(std::string
 ("hello-world.cpp"));
 const auto& members = ast->members();
 CppIncludeEPtr hashInclude = members[0];

 CppFunctionEPtr addFunc = members[1];

 const auto& mainBodyMembers = addFunc->defn()->members();

 CppVarEPtr abPlus = mainBodyMembers[0];
 std::cout << abPlus->name() << std::endl; //c

 AssignType atype = abPlus->assignType();
 std::cout << (atype == AssignType::kUsingEqual) << std::endl; // =,kUsingEqual

 const auto& avalue = abPlus->assignValue();
 std::cout << (avalue->oper_ == CppOperator::kPlus) << std::endl; // +,kPlus
 std::cout << *(avalue->expr1_.expr->expr1_.atom) << std::endl; //a
 std::cout << *(avalue->expr2_.expr->expr1_.atom) << std::endl; //b

 CppExprEPtr returnC = mainBodyMembers[1];
 std::cout <<  addFunc->retType_->baseType() << std::endl; //return int 
 std::cout << *(returnC->expr1_.atom) << std::endl; //c

CppFunctionEPtr mainFunc = members[2];
 const auto& mainBodyMembers2 = mainFunc->defn()->members();
 CppExprEPtr coutHelloWorld = mainBodyMembers2[0];
 CppExprAtom expr1 = coutHelloWorld->expr1_;
 CppExprAtom expr2 = coutHelloWorld->expr2_;
 std::cout << *(expr1.expr->expr1_.atom) << std::endl; //std::cout
 std::cout << *(expr2.expr->expr1_.atom) << std::endl; //"Hello World\n"

 CppVarEPtr callAdd = mainBodyMembers2[1];
 std::cout << callAdd->name() << std::endl; //res
 AssignType atype2 = callAdd->assignType();
 std::cout << (atype2 == AssignType::kUsingEqual) << std::endl; // =
 const auto& avalue2 = callAdd->assignValue();
 std::cout << (avalue2->oper_ == CppOperator::kFunctionCall) << std::endl; // kFunctionCall
 std::cout << *(avalue2->expr1_.atom) << std::endl; //add
 const auto& expr3 = avalue2->expr2_.expr;
 std::cout << *(expr3->expr1_.expr->expr1_.atom) << std::endl; //1
 std::cout << (expr3->oper_ == CppOperator::kComma) << std::endl; //, kComma
 std::cout << *(expr3->expr2_.expr->expr1_.atom) << std::endl; 2

You see, I did debug's hardwork to traverse the data structure from root to the leaf node to get the function body(such as "a","b","Hello World\n"). It confuses me how to use this parser to automatically traverse the data structure of different phrases. Or is there any solution to stop parsing below the function level and just save all the function body as strings? I'd be very appreciated if you can give me any hints, a short example code might be the best :)

dandelion915 avatar Jun 21 '23 01:06 dandelion915

I have been skimming through the codes for the past few days and I believe no such functionality has been implemented yet. There are some related codes in the *-info-accessor.h headers but they are not what you want. CppAST has visitor.cpp. For CppParser we can use CppWriter as a starting point for our visitor with some call back funtions. I am planning to use CppParser in my own projects and eventually I am going to have to implement visitors and AST matchers as well, but i am not sure about the timeline though. @dandelion915

salehjg avatar Jun 21 '23 11:06 salehjg

Hi @satya-das, are you accepting PRs?

salehjg avatar Jun 21 '23 14:06 salehjg

Thanks for your reply. I have figured out the exposure of function body by a different thought. First using cppparser to get all the function name and return type name, then use string match algorithm to find the beginning of the function, then math the { }. I dont know if this can be any use of you, but I'll provide this code anyway :)

CppCompoundPtr ast = parser.parseFile(filename);
    const auto& members = ast->members();
    for (const auto& member : members) {
        if (member->objType_ == CppObjType::kFunction) {
            CppFunctionEPtr func = member;
            const string& funcName = func->name_;
            const string& retType = func->retType_->baseType();
            cout << funcName << endl;
            string functionBody = findFunctionBody(filename, funcName,retType);
string findFunctionBody(const string& filename, const string& functionName, const string& retType) {
    ifstream file(filename);
    string line;

    string functionBody;
    bool foundFunction = false;
    bool insideFunction = false;
    int openBrackets = 0;
    int count = 0;
    int found_count = -2;
    while (getline(file, line)) {
        count++;
        if (!foundFunction && line.find(functionName) != string::npos && line.find(retType) != string::npos) {
            foundFunction = true;
            insideFunction = true;
            found_count = count;
        }

        if (count == found_count + 1) {
            found_count = count;
            functionBody += line;

            for (char c : line) {
                if (c == '{') {
                    openBrackets++;
                }
                else if (c == '}') {
                    openBrackets--;
                }
            }

            if (openBrackets == 0) {
                break;
            }
        }
    }

    return functionBody;

But this temporary string-match based solution has varies problems. Like comments may also contain function name or {}, so I'm still looking for parsers to get the function body. For producers for cppparser, maybe only adding tostring function to the project can lead to my solution, but sadly there isn't one.

dandelion915 avatar Jun 23 '23 02:06 dandelion915

By the way, are you familiar with Clang? I searched and found Clang has such abilities, but there's no tutorial document to get start...

dandelion915 avatar Jun 23 '23 03:06 dandelion915

@dandelion915 Oh wonderful. Thank you. Right now I am working on the visitor pattern to implement a generic visitor with the matching capability. Clang is quite capable. Unfortunately, there is a catch to it :) It does not expose the entire AST with stable API which is a big "no" for me. Also, the data structure of the clang AST is way more complex compared to cppparser, but I guess it's not a good idea to compare these two. Anyways, in case you need the Doxygen docs, you can access them at this link now.

salehjg avatar Jun 23 '23 09:06 salehjg

Dear @dandelion915, Please have a look at my fork's visitor and AST matcher classes. Everything seems to be working.

CppVisitorMatcher matcher({CppObjType::kExpression, CppObjType::kHashInclude});
ast->accept(&matcher);

This example matches only the expressions and hashInclude nodes. Here, ast is the root node. I have not merged it to my fork's main branch (called my-master) yet, but the code is ready.

salehjg avatar Jun 25 '23 21:06 salehjg

Hi @salehjg ! I have another question that I think you must know the answer. I want to use parser.parseStream to parse string/stream based code, but somehow this doesn't work. I wonder which part could possibly go wrong. I'd be very appreciated if u could give me some hint.

int main() {
   
    std::string input = string("int add(int a, int b);");
    CppParser parser;
    CppCompoundPtr ast = parser.parseStream(&input[0], input.size());
    const auto& members = ast->members();

	return 0;
}

dandelion915 avatar Jul 06 '23 08:07 dandelion915