lightning
lightning copied to clipboard
online learning
Hey,
What does it take to implement partial_fit in lightning? Is there a reason it is not implemented?
Non-contributor here:
What algorithms do you need? Do you have too much data (which can't be used through sparse-matrices)?
I do think, that the following algorithms need only minimal changes:
- SGDClassifier, SGDRegressor (already available in scikit-learn with partial_fit; only slightly different)
- AdaGradClassifier, AdaGradRegressor (slightly more work depending on internals)
- SAGClassifier, SAGRegressor (slightly more work depending on internals)
Impossible (algorithm-wise; batch-methods = full-gradient):
- FistaClassifier, FistaRegressor
- SVRGClassifier, SVRGRegressor
These could maybe work, but i'm unsure about the theory (there might be constraints on partial_fit; how to call it with which data):
- CDClassifier, CDRegressor
- SDCAClassifier, SDCARegressor
Thanks for the detailed overview!
I'm in reinforcement learning setup where the whole data is not available, and want to use a regression model which uses the data seen so far, without retraining it from scratch. I want to try an optimisation algorithm with an adaptive learning rate or a momentum, and lightning has a good AdaGradRegressor implementation.
Let's see what the developers think.
Just two random remarks:
- Did you try (carefully tuned) vanilla-SGD (the version in sklearn with partial_fit) for your use-case (i'm sceptical if AdaGrad is so much better, but this might be dependent on your data and i'm not an expert)
- There is a warm_start option in CDClassifier and SDCAClassifier... Maybe there is a clever way incorporate these possibilities in your setup
Yeah, I'm using vanilla SGD now; it works ok. The problem is that the component should work across many tasks, and it'd be nice to have less parameters to tune.
I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.
I had a look at modifying scikit-lightning for this. The outputs_2d_ initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?
Hi,
A patch that implements partial_fit would definitely be a nice addition !
Please submit a patch with the modifications that you propose. I'll allocate time to review them. On Jun 3, 2016 3:59 AM, "anttttti" [email protected] wrote:
I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.
I had a look at modifying scikit-lightning for this. The outputs_2d_ initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/lightning/issues/78#issuecomment-223472407, or mute the thread https://github.com/notifications/unsubscribe/AAQ8h7Ax4lxMfP7mutD_qXyPhAOQfCCwks5qH4pqgaJpZM4Is9SR .
I didn't get a patch written, I hacked the code first to see how easily this could be done. I think I got it working for the AdaGradRegressor case, but the results were not good, so I think I missed something. The results from Adagrad without my hack weren't much better than SGD on my data, and FTRL from Kaggler was vastly better. This is a general result on SGD vs. FTRL with high-dimensional data. Anyway, I got a partial_fit FTRL working by adding model pickling to Kaggler instead. I could look at contributing to Lighting later.
Attached is the hack I wrote, in case someone wants to continue from that. adagrad.py.txt
partial_fit is already supported in scikit-learn's SGD so I think we should focus on AdaGrad first.
@anttttti If you start a PR, we can help you track down the problem. Also make sure to write a unit test that checks that calling partial_fit multiple times is equivalent to fit.
I made a version of FTRL available as part of the package I made available: https://github.com/anttttti/Wordbatch/blob/master/wordbatch/models/ftrl.pyx
This support partial fit and online learning, weighted features, link function for classification/regression, and does instance-level parallelization with OpenMP prange.
This script probably won't fit the scope of current sklearn-contrib-lightning, so I've released it independently for now.