lightning online learning

Hey,

What does it take to implement partial_fit in lightning? Is there a reason it is not implemented?

Jun 02 '16 20:06 kmike

Non-contributor here:

What algorithms do you need? Do you have too much data (which can't be used through sparse-matrices)?

I do think, that the following algorithms need only minimal changes:

SGDClassifier, SGDRegressor (already available in scikit-learn with partial_fit; only slightly different)
AdaGradClassifier, AdaGradRegressor (slightly more work depending on internals)
SAGClassifier, SAGRegressor (slightly more work depending on internals)

Impossible (algorithm-wise; batch-methods = full-gradient):

FistaClassifier, FistaRegressor
SVRGClassifier, SVRGRegressor

These could maybe work, but i'm unsure about the theory (there might be constraints on partial_fit; how to call it with which data):

CDClassifier, CDRegressor
SDCAClassifier, SDCARegressor

Jun 02 '16 21:06 sschnug

Thanks for the detailed overview!

I'm in reinforcement learning setup where the whole data is not available, and want to use a regression model which uses the data seen so far, without retraining it from scratch. I want to try an optimisation algorithm with an adaptive learning rate or a momentum, and lightning has a good AdaGradRegressor implementation.

Jun 02 '16 22:06 kmike

Let's see what the developers think.

Just two random remarks:

Did you try (carefully tuned) vanilla-SGD (the version in sklearn with partial_fit) for your use-case (i'm sceptical if AdaGrad is so much better, but this might be dependent on your data and i'm not an expert)
There is a warm_start option in CDClassifier and SDCAClassifier... Maybe there is a clever way incorporate these possibilities in your setup

Jun 02 '16 22:06 sschnug

Yeah, I'm using vanilla SGD now; it works ok. The problem is that the component should work across many tasks, and it'd be nice to have less parameters to tune.

Jun 02 '16 22:06 kmike

I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.

I had a look at modifying scikit-lightning for this. The outputs_2d_ initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?

Jun 03 '16 01:06 anttttti

Hi,

A patch that implements partial_fit would definitely be a nice addition !

Please submit a patch with the modifications that you propose. I'll allocate time to review them. On Jun 3, 2016 3:59 AM, "anttttti" [email protected] wrote:

I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.

I had a look at modifying scikit-lightning for this. The outputs_2d_ initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/lightning/issues/78#issuecomment-223472407, or mute the thread https://github.com/notifications/unsubscribe/AAQ8h7Ax4lxMfP7mutD_qXyPhAOQfCCwks5qH4pqgaJpZM4Is9SR .

Jun 03 '16 06:06 fabianp

I didn't get a patch written, I hacked the code first to see how easily this could be done. I think I got it working for the AdaGradRegressor case, but the results were not good, so I think I missed something. The results from Adagrad without my hack weren't much better than SGD on my data, and FTRL from Kaggler was vastly better. This is a general result on SGD vs. FTRL with high-dimensional data. Anyway, I got a partial_fit FTRL working by adding model pickling to Kaggler instead. I could look at contributing to Lighting later.

Attached is the hack I wrote, in case someone wants to continue from that. adagrad.py.txt

Jun 03 '16 09:06 anttttti

partial_fit is already supported in scikit-learn's SGD so I think we should focus on AdaGrad first.

@anttttti If you start a PR, we can help you track down the problem. Also make sure to write a unit test that checks that calling partial_fit multiple times is equivalent to fit.

Jun 07 '16 01:06 mblondel

I made a version of FTRL available as part of the package I made available: https://github.com/anttttti/Wordbatch/blob/master/wordbatch/models/ftrl.pyx

This support partial fit and online learning, weighted features, link function for classification/regression, and does instance-level parallelization with OpenMP prange.

This script probably won't fit the scope of current sklearn-contrib-lightning, so I've released it independently for now.

Sep 24 '16 08:09 anttttti

lightning lightning copied to clipboard

online learning

I do think, that the following algorithms need only minimal changes:

Impossible (algorithm-wise; batch-methods = full-gradient):

These could maybe work, but i'm unsure about the theory (there might be constraints on partial_fit; how to call it with which data):

lightning
lightning copied to clipboard