mars
mars copied to clipboard
Improve the performance of `glm.LogisticRegression`
Current implementation could be found here #2466. In fact, iteratively calling stochastic gradient descent is quite inefficient for distributed frameworks like Mars.
Potential solutions could be:
- Zhuang, Yong, et al. "Distributed newton methods for regularized logistic regression." Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2015.
- Gopal, Siddharth, and Yiming Yang. "Distributed training of large-scale logistic models." International Conference on Machine Learning. PMLR, 2013.
Existing implementation of optimization algorithms that could be referred to:
- https://github.com/dask/dask-glm/blob/main/dask_glm/algorithms.py