Adding an example of how to perform timeseries forecasting with lagged features using expressions
First version, needs editing cleanup and double checking the section on lagged features.
After discussing IRL about this example, we decided that it should be a stand-alone example about handling timeseries with skrub, and that we should work on a different (simpler) example to showcase the datetime encoder
it should be a stand-alone example about handling timeseries with skrub
"timeseries forecasting", to be precise
I am working on this example again and I was wondering if I should cut most of the content that's relevant specifically to the DatetimeEncoder and focus only on the forecasting part.
The idea was having a simpler example that focuses only on the datetime encoder, so all the detail about it would be in the first example, while this would be more on how to combine the expressions with the encoder, and the lagged features.
The idea was having a simpler example that focuses only on the datetime encoder, so all the detail about it would be in the first example, while this would be more on how to combine the expressions with the encoder, and the lagged features.
Looks like a good approach!
For the record, I added the KBin discretizer to have "another method" to play with the expressions, but I don't have strong feelings about keeping it or removing it
I am working on this example again and I was wondering if I should cut most of the content that's relevant specifically to the DatetimeEncoder and focus only on the forecasting part.
The idea was having a simpler example that focuses only on the datetime encoder, so all the detail about it would be in the first example, while this would be more on how to combine the expressions with the encoder, and the lagged features.
Sounds reasonable to me!
I can't figure out where is the configuration that looks for the examples, so now that I moved the new example to the expression directory it doesn't get rendered 🤔
Did you find it? I believe that it is a regexp in the conf.py, in the sphinx-gallery configuration
Did you find it? I believe that it is a regexp in the conf.py, in the sphinx-gallery configuration
In the end, I think I was looking at the wrong generated docs, now it's rendering the example properly
It's why I removed the comment, next time I'll edit it instead
Thanks a lot @glemaitre for all the comments, I have addressed most of them. I decided to remove the KBins discretizer because it was not bringing a lot of insight and instead it was making the code longer and more annoying to deal with.
I'm trying to simplify the code for the plot.
the legend wasn't fixed, I am saving it outside the bounding box and sphinx is only saving whatever ends in the bounding box of the figure so legend and title are cut
edit: I ended up placing the legend inside the axes so I avoided the issue even though I don't like the result
ready for review
Hey Riccardo, I feel like this example is quite long and could be trimmed a little bit.
In particular, I think the datetime encoder could be made optional using choices, so that the focus is more "creating lags and optionally using the datetime encoder". We could nest some hyperparameters in the datetime encoder, but comparing the default to the tuned version isn't necessary IMO.
Also, using boosting trees might provide more compelling results if we're interested in having a good fit with the ground truth
The example was originally meant to show that periodic features are bringing a noticeable benefit, then we decided to add the lagged features to show that they are also useful. We decided to use ridge because it would make the difference more noticeable, while hgb would just work regardless of the periodic features (for the most part).
The example was originally meant to show that periodic features are bringing a noticeable benefit, then we decided to add the lagged features to show that they are also useful. We decided to use ridge because it would make the difference more noticeable, while hgb would just work regardless of the periodic features (for the most part).
I understand, but I'm still concerned with the length of the example. Having a noticeable effect for boosting trees, which are what most people use, would be nice, although I get the argument that our datasets might not give us a performance bump
After discussion with @Vincent-Maladiere, we were thinking of removing this example and replacing it with a much simpler example that shows how to work on dates using the DatetimeEncoder. Then, we could link the masterclass by @ogrisel and @glemaitre for a far more advanced and nuanced example than what we can easily maintain in the documentation (of course, if the authors are fine with it)
I think this example is at the same time too long to be an example, and not in-depth enough to be particularly useful.
Thoughts?
I think this example is at the same time too long to be an example, and not in-depth enough to be particularly useful.
Fine with that.
I would think that we should also consider long term merging in stuff from the master-class into the skrub documentation. I think so for multiple reasons, but one of them is that it will give us good examples to look at in terms of improving the API (ie creating functionality in skrub that make these examples easier to follow)
I would think that we should also consider long term merging in stuff from the master-class into the skrub documentation. I think so for multiple reasons, but one of them is that it will give us good examples to look at in terms of improving the API (ie creating functionality in skrub that make these examples easier to follow)
Yes, we should definitely do that
Closing this
I would think that we should also consider long term merging in stuff from the master-class into the skrub documentation
Agreed, and we should point to the masterclass content somewhere :)