BigCloneEval icon indicating copy to clipboard operation
BigCloneEval copied to clipboard

How to evaluate parse tree based tool on IJADataset

Open sudhirbelagali opened this issue 4 years ago • 2 comments

To detect clones, we convert java codes to parse tree, then calculate the similarity of two parse trees to check whether they are clones or what. BigCloneBench gives an error, kindly help how can we convert IJAdataset to parse tree. We are converting java code to parse tree using ANTLR grammar, it needs the main function in java code to convert into a parse tree. (IJA Dataset contains java files without main function). KIndly suggest how to go ahead to evaluate our work on BigCloneBench Screenshot from 2020-07-15 14-46-29

sudhirbelagali avatar Jul 15 '20 09:07 sudhirbelagali

IJaDataset contains a collection of source files scraped from open-source online sources (original work: https://sites.google.com/site/asegsecold/projects/seclone). I am not sure if it is possible to reconstruct the original software systems and locate a main function for each of these. If this is a requirement of your tool, it may be challenging.

Is it actually necessary to start at a main function? I would think that java files should be individually parseable, but I don't have experience with ANTLR for creating abstract syntax trees.

jeffsvajlenko avatar Jul 17 '20 01:07 jeffsvajlenko

I used ANTLR with my master thesis and it worked just fine. You can have a look at my grammar if you're interested. I think I borrowed it from over here, I'm not sure though. Plus I stripped down the grammar for performance, so you won't get a complete parse tree from it.

qw3ry avatar Sep 04 '20 05:09 qw3ry