skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Adding an example of how to perform timeseries forecasting with lagged features using expressions

Open rcap107 opened this issue 7 months ago • 13 comments

First version, needs editing cleanup and double checking the section on lagged features.

rcap107 avatar May 02 '25 15:05 rcap107

After discussing IRL about this example, we decided that it should be a stand-alone example about handling timeseries with skrub, and that we should work on a different (simpler) example to showcase the datetime encoder

rcap107 avatar May 05 '25 13:05 rcap107

it should be a stand-alone example about handling timeseries with skrub

"timeseries forecasting", to be precise

GaelVaroquaux avatar May 05 '25 13:05 GaelVaroquaux

I am working on this example again and I was wondering if I should cut most of the content that's relevant specifically to the DatetimeEncoder and focus only on the forecasting part.

The idea was having a simpler example that focuses only on the datetime encoder, so all the detail about it would be in the first example, while this would be more on how to combine the expressions with the encoder, and the lagged features.

rcap107 avatar Jun 12 '25 14:06 rcap107

The idea was having a simpler example that focuses only on the datetime encoder, so all the detail about it would be in the first example, while this would be more on how to combine the expressions with the encoder, and the lagged features.

Looks like a good approach!

Vincent-Maladiere avatar Jun 12 '25 16:06 Vincent-Maladiere

For the record, I added the KBin discretizer to have "another method" to play with the expressions, but I don't have strong feelings about keeping it or removing it

rcap107 avatar Jun 12 '25 16:06 rcap107

I am working on this example again and I was wondering if I should cut most of the content that's relevant specifically to the DatetimeEncoder and focus only on the forecasting part.

The idea was having a simpler example that focuses only on the datetime encoder, so all the detail about it would be in the first example, while this would be more on how to combine the expressions with the encoder, and the lagged features.

Sounds reasonable to me!

GaelVaroquaux avatar Jun 12 '25 19:06 GaelVaroquaux

I can't figure out where is the configuration that looks for the examples, so now that I moved the new example to the expression directory it doesn't get rendered 🤔

Did you find it? I believe that it is a regexp in the conf.py, in the sphinx-gallery configuration

GaelVaroquaux avatar Jun 12 '25 19:06 GaelVaroquaux

Did you find it? I believe that it is a regexp in the conf.py, in the sphinx-gallery configuration

In the end, I think I was looking at the wrong generated docs, now it's rendering the example properly

It's why I removed the comment, next time I'll edit it instead

rcap107 avatar Jun 12 '25 19:06 rcap107

Thanks a lot @glemaitre for all the comments, I have addressed most of them. I decided to remove the KBins discretizer because it was not bringing a lot of insight and instead it was making the code longer and more annoying to deal with.

I'm trying to simplify the code for the plot.

rcap107 avatar Jun 13 '25 12:06 rcap107

the legend wasn't fixed, I am saving it outside the bounding box and sphinx is only saving whatever ends in the bounding box of the figure so legend and title are cut

edit: I ended up placing the legend inside the axes so I avoided the issue even though I don't like the result

rcap107 avatar Jun 13 '25 13:06 rcap107

ready for review

rcap107 avatar Jun 13 '25 14:06 rcap107

Hey Riccardo, I feel like this example is quite long and could be trimmed a little bit.

In particular, I think the datetime encoder could be made optional using choices, so that the focus is more "creating lags and optionally using the datetime encoder". We could nest some hyperparameters in the datetime encoder, but comparing the default to the tuned version isn't necessary IMO.

Also, using boosting trees might provide more compelling results if we're interested in having a good fit with the ground truth

The example was originally meant to show that periodic features are bringing a noticeable benefit, then we decided to add the lagged features to show that they are also useful. We decided to use ridge because it would make the difference more noticeable, while hgb would just work regardless of the periodic features (for the most part).

rcap107 avatar Jun 18 '25 13:06 rcap107

The example was originally meant to show that periodic features are bringing a noticeable benefit, then we decided to add the lagged features to show that they are also useful. We decided to use ridge because it would make the difference more noticeable, while hgb would just work regardless of the periodic features (for the most part).

I understand, but I'm still concerned with the length of the example. Having a noticeable effect for boosting trees, which are what most people use, would be nice, although I get the argument that our datasets might not give us a performance bump

Vincent-Maladiere avatar Jun 18 '25 14:06 Vincent-Maladiere

After discussion with @Vincent-Maladiere, we were thinking of removing this example and replacing it with a much simpler example that shows how to work on dates using the DatetimeEncoder. Then, we could link the masterclass by @ogrisel and @glemaitre for a far more advanced and nuanced example than what we can easily maintain in the documentation (of course, if the authors are fine with it)

I think this example is at the same time too long to be an example, and not in-depth enough to be particularly useful.

Thoughts?

rcap107 avatar Jul 11 '25 12:07 rcap107

I think this example is at the same time too long to be an example, and not in-depth enough to be particularly useful.

Fine with that.

I would think that we should also consider long term merging in stuff from the master-class into the skrub documentation. I think so for multiple reasons, but one of them is that it will give us good examples to look at in terms of improving the API (ie creating functionality in skrub that make these examples easier to follow)

GaelVaroquaux avatar Jul 11 '25 12:07 GaelVaroquaux

I would think that we should also consider long term merging in stuff from the master-class into the skrub documentation. I think so for multiple reasons, but one of them is that it will give us good examples to look at in terms of improving the API (ie creating functionality in skrub that make these examples easier to follow)

Yes, we should definitely do that

rcap107 avatar Jul 11 '25 12:07 rcap107

Closing this

rcap107 avatar Jul 11 '25 12:07 rcap107

I would think that we should also consider long term merging in stuff from the master-class into the skrub documentation

Agreed, and we should point to the masterclass content somewhere :)

Vincent-Maladiere avatar Jul 11 '25 14:07 Vincent-Maladiere