ClangSharp icon indicating copy to clipboard operation
ClangSharp copied to clipboard

How to traverse the AST

Open dandelion915 opened this issue 2 years ago • 5 comments

Let's say i have a source code hello-world.cpp:

#include <iostream>

int add(int a, int b)
{
  int c = a + b;
  return c;
}

int main()
{
  std::cout << "Hello World\n";
  int res = add(1, 2);
  return 0;
}

I want to get every function body and save it as string. I know I can realize it by this ClangSharp, but i cant find any tutorial on how to traverse the ast, like how to use CXCursor.VisitChild. This is as far as i can go now:

 var tmpfile = "Hello-world.cpp";
        CXIndex Index = CXIndex.Create(false, false);
        CXTranslationUnit TU;
        CXTranslationUnit_Flags tmpFlag = CXTranslationUnit_Flags.CXTranslationUnit_DetailedPreprocessingRecord;
        var error = CXTranslationUnit.TryParse(Index, tmpfile, new ReadOnlySpan<string>(args), null, tmpFlag, out TU);

Can anyone can me some hint and an example code is the best :) I can achieve this function in python:

import clang.cindex

file_name = "hello-world.cpp"
index = clang.cindex.Index.create() 
# unsaved_files=[('test.cpp', code)],
translation_unit = index.parse(path=file_name, args=['-std=c++11'])  #
for node in translation_unit.cursor.walk_preorder():
    if node.kind == clang.cindex.CursorKind.FUNCTION_DECL:
        if node.extent.start.file.name == file_name:
            print(node.extent.start.file.name)
            print('Found Function:', node.spelling)
            print('Function Body:', node.extent.end.offset - node.extent.start.offset)
            print(node.extent.start.line, node.extent.start.column)
            st = node.extent.start.line
            print(node.extent.end.line, node.extent.end.column)
            ed = node.extent.end.line
        # print(node.extent.start.file.contents[node.extent.start.offset:node.extent.end.offset])
            count = 0
            for line in open(file_name,encoding='UTF-8'):
                count += 1
                if count >= st+1 and count<=ed:
                    print(line)

dandelion915 avatar Jun 21 '23 06:06 dandelion915

You can write effectively the same code, just needing to change a couple minor things where python has defined its own extension points.

You have two options. You can either use the raw interop APIs or the higher level wrapper. The higher level wrapper is easier/friendlier and more closely matches the Clang C++ API surface.

You would do var translationUnit = TranslationUnit.GetOrCreate(TU) to get a managed wrapper over the raw translation unit. You then have access to the cursor via translationUnit.TranslationUnitDecl and can find all its direct children via .CursorChildren.

A basic visitor can be seen here: https://source.clangsharp.dev/#ClangSharp.PInvokeGenerator/PInvokeGenerator.cs,4f00f2f122c68eda,references

You can naturally filter or otherwise handle things using LINQ, pattern matching, or other kinds of checks.

tannergooding avatar Jun 21 '23 17:06 tannergooding

I have installed NuGet ClangSharp & libClangSharp 16.0, but there's an Error System.DllNotFoundException: 'Unable to load DLL 'libClangSharp':. Could u give some hint how to sovle it? Do I need VS19 or lower?

dandelion915 avatar Jun 23 '23 05:06 dandelion915

You're probably hitting a known limitation with NuGet. You can resolve the issue by adding the following to your csproj:

  <PropertyGroup Condition="'$(RuntimeIdentifier)' == ''">
    <RuntimeIdentifier>$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
  </PropertyGroup>

image

tannergooding avatar Jun 23 '23 14:06 tannergooding

Thanks for your speedy reply, I have successfully run this tool, but it seemed clang can only parse function with standard return type? For example, I have three functions:

extern bool NeedUpgradeStockInTransitAccount (CBizEnv& bizEnv, CBusinessObject* bo);



CBusinessObject	*CWarehouse::CreateObject (const TCHAR *id, CBizEnv &env)
{
	TRACE_METHOD;
	return new CWarehouse (id, env);
}




CWarehouse::CWarehouse (const TCHAR *id, CBizEnv &env) :
	CSystemBusinessObject (id, env),
	m_AlertInCheckDelete(true)
{
	TRACE_METHOD;
}

But this tool can only recognize the first function.

dandelion915 avatar Jun 28 '23 03:06 dandelion915

I have successfully run this tool, but it seemed clang can only parse function with standard return type?

Sorry, I missed this last response. If I remember correctly, then due to how the AST actually exists you need it to traverse the CWarehouse declaration. That will then allow the definitions to be found via the actual declaration; otherwise it appears to be uninstantiable code.

tannergooding avatar Nov 18 '23 18:11 tannergooding