cruise icon indicating copy to clipboard operation
cruise copied to clipboard

Debug Dolphin's algorithm correctness

Open yunseong opened this issue 9 years ago • 3 comments

This is part of #265.

As described above link, LR algorithm does not seem to provide enough accuracy. We need to verify that the accuracy increases to a reasonable degree (e.g., Spark MLLib) over iterations.

yunseong avatar Nov 18 '15 06:11 yunseong

@yunseong Does Vortex give similar LR accuracy as Dolphin? (~66% for URL reputation data)

gyeongin avatar Nov 18 '15 07:11 gyeongin

@gyeongin Yes, both results were very similar.

yunseong avatar Nov 18 '15 08:11 yunseong

I found that the number of tasks affects Dolphin's algorithm correctness. I used first 10000 lines in URL reputation dataset, and run LR job with following two configurations: -dim 3231961 -maxIter 5 -stepSize 0.00001 -lambda 0.1 -split 1 -dim 3231961 -maxIter 5 -stepSize 0.00001 -lambda 0.1 -split 4 The first one gives 64.89%, and the second one gives 58.55%. We should improve our algorithm to make sure that different data arrangement does not harm the original function of algorithm.

gyeongin avatar Nov 25 '15 02:11 gyeongin