precise icon indicating copy to clipboard operation
precise copied to clipboard

Porting this to River

Open MaxHalford opened this issue 2 years ago • 13 comments

Hello there! I hope you're doing well.

I recently saw this repo pop up and I like it very much. There are some things that we don't yet have in River. In particular, I'm thinking of the OnlineEmpiricalCovariance class.

Would it be ok if we ported some of this stuff into River? I rather ask a gentleman rather savagely copy the code.

Kind regards.

MaxHalford avatar Feb 03 '22 09:02 MaxHalford

Hi @MaxHalford

Savagely copying is fine - though we may want to cross-fertilize our unit tests! Aside: the only reason I'm avoiding classes is due to quirks of my deployment - a desire to avoid object serialization issues.

With a little help, this old dog could probably be taught how to make useful PR's to river. I was thinking the same thing with some of the online timeseries stuff I'm doing, although I couldn't quite grok how to do k-step ahead things. If I understand correctly, k=1 step ahead is pretty straightforward in river and that's the focus of "precise", so it seems like a good idea to join forces there.

Remarks:

  • Some of the portfolio stuff might help with river online ensembling/stacking/mixtures of experts, though first I'm just establishing the baselines with traditional setup.
  • Some methods like online Ledoit-Wolf here are speculative. Up to you if you want to see how the Elo ratings pan out.

Peter

microprediction avatar Feb 03 '22 14:02 microprediction

ps: don't forget to enter M6. I'd like to see some open-source devs win prizes!

microprediction avatar Feb 03 '22 14:02 microprediction

With a little help, this old dog could probably be taught how to make useful PR's to river. I was thinking the same thing with some of the online timeseries stuff I'm doing, although I couldn't quite grok how to do k-step ahead things. If I understand correctly, k=1 step ahead is pretty straightforward in river and that's the focus of "precise", so it seems like a good idea to join forces there.

It would great if we could work something out. You definitely seem like you have strong coding abilities. The only thing is that River operates on dicts, not numpy arrays. We do use numpy arrays, but only for mini-batch updates. For instance see StandardScaler.

I would say that including these methods in River, and participating in the project, would maybe allow to reach a wider audience. For instance, I know a few teams that would enjoy having an online covariance matrix for anomaly detection purposes.

Some methods like online Ledoit-Wolf here are speculative. Up to you if you want to see how the Elo ratings pan out.

It's good you point that out. We do try to focus on established methods, a bit like scikit-learn. We also have a river-extra repository for more "experimental" stuff.

ps: don't forget to enter M6. I'd like to see some open-source devs win prizes!

Yep it's on my list ;)

MaxHalford avatar Feb 03 '22 14:02 MaxHalford

Makes sense. Perhaps if you create the basic running empirical online cov calculation, then it will be simple for me to PR a few others as they stabilize following your pattern.

microprediction avatar Feb 03 '22 16:02 microprediction

Will do 👌

MaxHalford avatar Feb 03 '22 16:02 MaxHalford

Ok I'm done, here it is. Let me know if you have any questions!

Nota bene: I have had on my todo list since far too long to into microprediction.com. It will get done at some point :)

Keep up the great work 🤝

MaxHalford avatar Feb 04 '22 01:02 MaxHalford

Question for you because I'm blind and can't find it: do you have online formulas for the online precision matrix? That would enable many other algorithms, in particular Bayesian methods.

MaxHalford avatar Feb 04 '22 15:02 MaxHalford

Hi @MaxHalford I somehow missed this thread, probably under 50000 system alerts killing my inbox.

microprediction avatar Jun 10 '22 23:06 microprediction

Massively delayed answer to your question about precision - I don't yet have precision skaters but I think the method used by Lee and Zhong might be of interest: https://github.com/microprediction/precise/blob/main/precise/skaters/covariance/ewalzfactory.py

microprediction avatar Jun 10 '22 23:06 microprediction

Re river. I would think my first PR would be something like expon weighted sample cov. Does that make sense?

microprediction avatar Jun 10 '22 23:06 microprediction

Hi @MaxHalford I somehow missed this thread, probably under 50000 system alerts killing my inbox.

Don't apologize!

Massively delayed answer to your question about precision - I don't yet have precision skaters but I think the method used by Lee and Zhong might be of interest: https://github.com/microprediction/precise/blob/main/precise/skaters/covariance/ewalzfactory.py

Thanks, I'll take a look. My current thinking is that the Sherman-Morrison formula can be used.

Re river. I would think my first PR would be something like expon weighted sample cov. Does that make sense?

Sure, that would be most appreciated! We have added an online covariance matrix, which you can see here. Under the hood it simply orchestrates a bunch of Covs. My instinct would be to do the same with exponentially weighted covariances. But we don't have those yet! We only have expo weighted variances, see here.

MaxHalford avatar Jun 11 '22 01:06 MaxHalford

bubbling this up . note to self

microprediction avatar Aug 29 '22 18:08 microprediction

I've implemented the precision matrix, see here :)

MaxHalford avatar Aug 29 '22 20:08 MaxHalford