CodRep-competition Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas

Open chenzimin opened this issue 6 years ago • 10 comments

Created for Team CSV(@cesarsotovalero) from the Universidad Central "Marta Abreu" de Las Villas for discussions. Welcome!

May 18 '18 06:05 chenzimin

Excellent, welcome! What's your score on Dataset1?

May 18 '18 09:05 monperrus

My current scores using just a very naive string comparison based approach:

Score on dataset1: 0.1236735 Score on dataset2: 0.1096176

No machine learning yet.

May 18 '18 16:05 cesarsotovalero

Yes. The first 0.8 are easy to get (purely due to the data).

The remaining points are super hard.

Best score seen so far:

Dataset1: 0.114
Dataset2: 0.085

May 21 '18 07:05 monperrus

My last scores:

Dataset	Perfect Match	Score
Dataset 1	3867	0.11842962430821
Dataset 2	9833	0.108660931336428
Dataset 3	17197	0.0753167732657934

My current approach: string matching + parse checking

May 29 '18 15:05 cesarsotovalero

Thanks, I have updated the rankings

May 30 '18 08:05 chenzimin

good scores, getting quite close to @tdurieux :-)

May 31 '18 10:05 monperrus

Hi everyone, I want to give an update of my scores for the preliminary ranking:

Dataset	Perfect Match	Score
Dataset1	3900	0.1111243868013270
Dataset2	9948	0.0995737723246198
Dataset3	17438	0.0631975953292782
Dataset4	15773	0.0769219481612277

My current approach is: string matching + parse checking + decision rules + heuristics

Aug 21 '18 22:08 cesarsotovalero

It seems that you beat @tdurieux!! Congrats.

It's too late to be considered in the intermediate ranking, but it's really remarkable.

Aug 22 '18 08:08 monperrus

Thanks @monperrus!! However, my approach has some performance issues. For instance, it takes almost 2h for Dataset1, which is far from the performance results of @tdurieux. Also, I think the accuracy (in terms of the loss function) should be improved much more to really win the competition. I'll continue working on that.

Aug 22 '18 08:08 cesarsotovalero

Strangely my technique is still better for the dataset 2 but worse for the others.

I still have some room for improvement but I am very happy of the performance of my technique. It takes less than 10min to have the results on all datasets. That is helping a lot to try new improvements

Aug 22 '18 09:08 tdurieux

CodRep-competition CodRep-competition copied to clipboard

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas

CodRep-competition
CodRep-competition copied to clipboard