cruise
cruise copied to clipboard
Debug Dolphin's algorithm correctness
This is part of #265.
As described above link, LR algorithm does not seem to provide enough accuracy. We need to verify that the accuracy increases to a reasonable degree (e.g., Spark MLLib) over iterations.
@yunseong Does Vortex give similar LR accuracy as Dolphin? (~66% for URL reputation data)
@gyeongin Yes, both results were very similar.
I found that the number of tasks affects Dolphin's algorithm correctness. I used first 10000 lines in URL reputation dataset, and run LR job with following two configurations:
-dim 3231961 -maxIter 5 -stepSize 0.00001 -lambda 0.1 -split 1
-dim 3231961 -maxIter 5 -stepSize 0.00001 -lambda 0.1 -split 4
The first one gives 64.89%, and the second one gives 58.55%.
We should improve our algorithm to make sure that different data arrangement does not harm the original function of algorithm.