CodRep-competition icon indicating copy to clipboard operation
CodRep-competition copied to clipboard

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas

Open chenzimin opened this issue 6 years ago • 10 comments

Created for Team CSV(@cesarsotovalero) from the Universidad Central "Marta Abreu" de Las Villas for discussions. Welcome!

chenzimin avatar May 18 '18 06:05 chenzimin

Excellent, welcome! What's your score on Dataset1?

monperrus avatar May 18 '18 09:05 monperrus

My current scores using just a very naive string comparison based approach:

Score on dataset1: 0.1236735 Score on dataset2: 0.1096176

No machine learning yet.

cesarsotovalero avatar May 18 '18 16:05 cesarsotovalero

Yes. The first 0.8 are easy to get (purely due to the data).

The remaining points are super hard.

Best score seen so far:

  • Dataset1: 0.114
  • Dataset2: 0.085

monperrus avatar May 21 '18 07:05 monperrus

My last scores:

Dataset Perfect Match Score
Dataset 1 3867 0.11842962430821
Dataset 2 9833 0.108660931336428
Dataset 3 17197 0.0753167732657934

My current approach: string matching + parse checking

A related paper: A comparison of code similarity analysers

cesarsotovalero avatar May 29 '18 15:05 cesarsotovalero

Thanks, I have updated the rankings

chenzimin avatar May 30 '18 08:05 chenzimin

good scores, getting quite close to @tdurieux :-)

monperrus avatar May 31 '18 10:05 monperrus

Hi everyone, I want to give an update of my scores for the preliminary ranking:

Dataset Perfect Match Score
Dataset1 3900 0.1111243868013270
Dataset2 9948 0.0995737723246198
Dataset3 17438 0.0631975953292782
Dataset4 15773 0.0769219481612277

My current approach is: string matching + parse checking + decision rules + heuristics

cesarsotovalero avatar Aug 21 '18 22:08 cesarsotovalero

It seems that you beat @tdurieux!! Congrats.

It's too late to be considered in the intermediate ranking, but it's really remarkable.

monperrus avatar Aug 22 '18 08:08 monperrus

Thanks @monperrus!! However, my approach has some performance issues. For instance, it takes almost 2h for Dataset1, which is far from the performance results of @tdurieux. Also, I think the accuracy (in terms of the loss function) should be improved much more to really win the competition. I'll continue working on that.

cesarsotovalero avatar Aug 22 '18 08:08 cesarsotovalero

Strangely my technique is still better for the dataset 2 but worse for the others.

I still have some room for improvement but I am very happy of the performance of my technique. It takes less than 10min to have the results on all datasets. That is helping a lot to try new improvements

tdurieux avatar Aug 22 '18 09:08 tdurieux