py-earth
py-earth copied to clipboard
Using user defined functions with fittable parameters in MARS
Hi I was wondering if the following feature was included or could be included in MARS. to explain the above let me give an example. If I am simulating gas demand I know this depends on the temperature. However it also depends on the temperature in the past as buildings have a heat island effect.
If in have hourly data of the temperature I could pass every hour for say the last week into MARS. But this would be slow. I also know that more recent temperatures should matter more. It would be nice if I could say set a function that weights the temperature with an exponential decay. But have MARS fit the rate of this decay. I can thinking other examples where the user might be able to provide insight in the model rather than forcing MARS TO derive everything. Is something like the above possible?
@Fish-Soup It would be possible to do something like that in some limited form, although it isn't currently implemented. It would depend a lot on the type of function and how you want it to interact with the other terms in the model. In the case of fitting a MARS model for the decay rate, that seems more like MARS with a custom loss function. If that loss function is convex then gradient boosting might be appropriate. Can you give more mathematical details about your use case?
@jcrudy I think maybe I'm being unclear or at least I don't understand how changing the loss function would help....
let's say I have some relationship such that
d = f1(t* , a, b, c)
where t* = f2( t1, t2, t3, t4,......tn)
now let's say I am certain of the function form of f2, it's an exponentially decayed average of all the t1 to tn.
so we could write t* = [sum (n,1=》N) tn×exp(-n×a)]/sum (n,1=》N) exp(-n×a)].
I know t1=》tn but I don't know a.
however I don't also know what t* is I just know d and d is some function f1.
I would hope to fit d such that MARS can modify the value of a ( the exponential decay) and modify how our new variable t* interacts with the other variables.
I of course could throw all the values of t1 =》 tn into MARS but I don't think I have enough ram and in this case I believe I know part of the total function of d better than MARS could find as I have knowledge of how the process works.
apologies from the awfully written equation. I'm doing this on my phone...
@Fish-Soup I just realized when responding to your other post that I forgot to respond to this one. It seems like you can fit f2 separately from MARS (using some optimization package), then pass the predictions from f2, as well as the fitted value of a, into MARS to fit f1. This would mean MARS can't modify a, but in this scenario I'm not sure you'd want it to. This is also assuming the correct loss function for the f1 fit is the MSE. Otherwise, you'd need gradient boosting or some other method to change the loss function.
I could do what you suggest. But I'd rather the function was fitted in MARS. for example I know that d has a nonlinear relationship with t* so I wonder how well the relationship with t* could be probed. I suppose with enough data this shouldn't be a problem. But often the amount of data I have is short. still I will give it a go see how it works.
Cheers
I'm thinking about a custom loss function. I suppose I have 2 goals.
- make sure time lagged data is weighted less the longer the lag.
- I hope by using the exponential I reduce the degrees of freedom and thus speed up the fitting process. I can imagine maybe a custom loss function I could achieve 1 but not 2.
how would I go about it. some how I would have to tell the model that some data was more lagged than others.
cheers
For 1, you can just use sample weights based on lag time. However, I don't know by what principle you would calculate the exact weights. Perhaps you could figure out something reasonable, though. Just know that py-earth does support weighted samples. Not sure about 2. If you're confident the exponential form is correct for the t* part of your model, but don't know the relationship between t* and d, perhaps you could try some expectation-maximization style approach, with MARS used for f1. This would definitely not speed things up, but might give you better results. Would be some work to get that right, though, if it could work at all.
Regarding custom loss functions, I don't know the nature of the problem that well. If you want to minimize (possibly weighted) mean squared error of f1, then MARS does that by default. If you want any other kind of loss function, you have to resort to gradient boosting to get it with MARS.
P.S. In my previous comment I didn't understand that t* is unobserved, so part of my comment didn't make that much sense.
hi with 1 I hadn't realized tou could have a different weight for each variable within your xdata I could try that.....
With regards to 2 I am trying to work out a way I can send say 200 lagged temperatures into the model. that would be 200 extra columns.... I thought if essentially those 200 columns can be condensed to a 1 parameter fit (exponentially weighted ) point then inserted into MARS i would reduce the degrees of freedom and thus make it easier to fit. and more likely to get an sensible result.
I suppose this is quite a custom request. but I imagine being able to say call the curve fit module in stats models to reduce components of the x variables down would be useful.
I suppose me difficulties come from the fact I'm trying to model a process that in some ways works like a regression model and in other ways looks more like time series data. I did read about ts mars which is supposed to be a time series implementation but I haven't read much into it.