py-earth
py-earth copied to clipboard
Support for online learning?
How feasible would it be to implement online learning, e.g. a partial_fit() method, to allow an existing model to be modified with new data? This would also allow for out-of-core learning and streaming applications.
@DoctorRad Unfortunately, the nature of MARS makes online learning basically impossible. The reason for this is that the forward pass is a greedy step-wise search for new terms. If you add new data, you have no way of knowing that the earliest terms in your model would be unaffected.
However, it would be possible, in theory, to allow for model fitting to be resumed, and new terms added with new data, after the initial model fit. That might be worth doing in some cases, although eventually you would probably want to fit a new model on your entire data set.
Question: what problem are your trying to solve with online learning? Perhaps I can suggest a workaround, although you might also be better off just using a method that allows for online learning.
@DoctorRad Regarding out-of-core learning, it is theoretically possible to build a MARS implementation that operates across a cluster, but it would be a substantial undertaking and there would still need to be some central coordination node doing a good amount of work. Shared memory parallelism is much more feasible, but not implemented in py-earth (except for perhaps some of the BLAS operations, depending on your environment).
@jcrudy Thanks for your feedback. I suspected that it was largely not possible as I couldn't conceive of a way that it could be done, but thought you might have better ideas.
I am currently using py-earth as a tool to help me learn python data science, so it's only really toy problems for now. However, the regression problem I am considering at the moment has a mixture of continuous and ordinal variables with a considerable amount of missing data, which is attracted me to MARS.