CodRep-competition
CodRep-competition copied to clipboard
Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas
Created for Team CSV(@cesarsotovalero) from the Universidad Central "Marta Abreu" de Las Villas for discussions. Welcome!
Excellent, welcome! What's your score on Dataset1?
My current scores using just a very naive string comparison based approach:
Score on dataset1: 0.1236735 Score on dataset2: 0.1096176
No machine learning yet.
Yes. The first 0.8 are easy to get (purely due to the data).
The remaining points are super hard.
Best score seen so far:
- Dataset1: 0.114
- Dataset2: 0.085
My last scores:
Dataset | Perfect Match | Score |
---|---|---|
Dataset 1 | 3867 | 0.11842962430821 |
Dataset 2 | 9833 | 0.108660931336428 |
Dataset 3 | 17197 | 0.0753167732657934 |
My current approach: string matching + parse checking
A related paper: A comparison of code similarity analysers
Thanks, I have updated the rankings
good scores, getting quite close to @tdurieux :-)
Hi everyone, I want to give an update of my scores for the preliminary ranking:
Dataset | Perfect Match | Score |
---|---|---|
Dataset1 | 3900 | 0.1111243868013270 |
Dataset2 | 9948 | 0.0995737723246198 |
Dataset3 | 17438 | 0.0631975953292782 |
Dataset4 | 15773 | 0.0769219481612277 |
My current approach is: string matching + parse checking + decision rules + heuristics
It seems that you beat @tdurieux!! Congrats.
It's too late to be considered in the intermediate ranking, but it's really remarkable.
Thanks @monperrus!! However, my approach has some performance issues. For instance, it takes almost 2h for Dataset1, which is far from the performance results of @tdurieux. Also, I think the accuracy (in terms of the loss function) should be improved much more to really win the competition. I'll continue working on that.
Strangely my technique is still better for the dataset 2 but worse for the others.
I still have some room for improvement but I am very happy of the performance of my technique. It takes less than 10min to have the results on all datasets. That is helping a lot to try new improvements