pyspi icon indicating copy to clipboard operation
pyspi copied to clipboard

sktime/sklearn integration?

Open fkiraly opened this issue 2 years ago • 8 comments

@anniegbryant, @benfulcher, I would like to congratulate you to this nice package, I really like the concept and it is quite nicely designed! There are also a lot of useful methods collected! Nice.

Now imo the next "big" question is integrability with the wider modelling ecosystem, e.g., can I use the pairwise time series metrics as components in sktime or sklearn. Where with "I", of course, I mean the wider user ecosystem.

Currently, I think there are a few blockers, but would you be interested to resolve them together?

Two main points imo from the codebase review:

  • sklearn interoperable interfaces expect a few things such as __init__ signature related, and availability of get_params, set_params. You can get this for free by inheriting from scikit-base base classes, of course that's not the only way to satisfy the interface requirements.
  • sktime has related classes which you could adopt or adapt, e.g., the BasePairwiseTransformerPanel. Options could involve, writing an adapter in sktime, or using the class in pyspi, the latter would give you testing for free by using check_estimator. Or, writing your own base class template based on scikit-base that marries the current interface definition with sklearn and sktime expectations.

Side points but synergistic points:

  • testing could - and should - be more systematic for reliable use, e.g., CI on operating system and python version combinations. Happy to help setting this up if we set aside some time. Of course, the "sktime interface" option would take care of this as part of sktime, although bugfixing could become more clunky as we would have to push bug reports upstream (like in pycatch22).
  • a good object/estimator search utility might be nice for the user, there are a lot of implemented objects! We could lift some components from sktime or skbase here.

fkiraly avatar Oct 17 '23 15:10 fkiraly

Thanks @fkiraly for the kind words and enthusiasm! The compliments are best directed at @olivercliff who did the software dev for this project.

I personally don't have the time or python expertise to contribute much to software expansion efforts, but @olivercliff may be able to weigh in on this point. It's possible @anniegbryant may be able to help somewhat but will leave to her…

Ultimately would be great to have a student or keen software dev join the team—e.g., could be a good Google Summer of Code project. Will keep you posted…

benfulcher avatar Oct 18 '23 01:10 benfulcher

Hi @fkiraly, glad to hear you like it! In fact, I designed the code with future integration of the sktime/sklearn framework in mind, which is probably why certain parts of it feel familiar (and hopefully the integration would not be too much of a hassle).

Your two main points, imo, would not only allow integration with sklearn/sktime, but also significantly improve the readability and usability of the standalone package. My thoughts after having a quick look at the code you referenced:

  • The sklearn-base classes might be the more difficult aspect to implement, as it looks like it requires pyspi to handle data differently - is that correct? Many methods store certain results directly in the data object in order to extract statistics from these results later on; otherwise the computation time blows out significantly. I imagine there is a simpler way to achieve this using the sklearn framework but I have not come across it yet.
  • Adopting the BasePairwiseTransformerPanel sounds achievable in a shorter period of time. Moreover, the arguments cover all cases that the methods in pyspi require (e.g., multivariate or bivariate) and extend in useful directions (e.g., handles NaN or not).

I am unfortunately quite short on time these days and don't work directly on the codebase anymore, so I think the idea of a GSoC project, as @benfulcher suggests, is a great way forward.

olivercliff avatar Oct 18 '23 10:10 olivercliff

Hey @fkiraly, @benfulcher, @olivercliff!

Has there been any progress on the Google Summer of code? I might be interested in doing the sklearn integration, but I didn't find the project in the sktime projects list.

bruAristimunha avatar Feb 25 '24 16:02 bruAristimunha

@bruAristimunha, apologies, I did not see this post!

Yes, we have been selected for GSoC 2024, and this would have been an excellent topic!

Unfortunately, the application deadline was April 2.

We could still work on this though? We have a great (unpaid) mentoring programme! https://github.com/sktime/mentoring/tree/main

Or perhaps @benfulcher has an academic internship available?

fkiraly avatar Apr 14 '24 19:04 fkiraly

@benfulcher, @olivercliff, apologies, I missed the more recent discusion in my inbox.

Let us know if further collaboration here is of interest, we are going to kick off our summer workstreams in May.

fkiraly avatar Apr 14 '24 19:04 fkiraly

Hi @fkiraly,

Unfortunately, doing unpaid work this way is not very interesting for me, but I appreciate the answer. It would be a "hard" project, with a lot of code, and a lot of time commitment.

Maybe next year if sktimes is selected.

bruAristimunha avatar Apr 14 '24 19:04 bruAristimunha

@bruAristimunha, we did get selected 2024, getting paid would have required an application by April 2. Sorry that I did not see this.

How about an alternative idea then, @benfulcher: you (or someone from your team) could present pyspi in one of the sktime meet-ups, these are Fridays 4pm UTC at the moment. There is one free slot on April 26, and most of June is also available.

The aim would be to present pyspi and a potential integration project, I'm sure many members of the community and adjacent listeners would find this interesting, someone might take that up.

fkiraly avatar Apr 16 '24 07:04 fkiraly

Ok sounds good thanks for the invite—would be happy to present pyspi. @jmoo2880 has done a bunch of work on it recently, getting it into a nice format (e.g., now pip installable). Trouble is that 4pm UTC seems to be 2am Sydney time, so it's not going to work at that timing.

benfulcher avatar Apr 19 '24 04:04 benfulcher