evadb
evadb copied to clipboard
Improve the 16-homesale-forecasting.ipynb to give details on how to tune every configuration.
Update the notebook with neuralforecast and prediction for every postcode after the math domain error get fixed in #1283
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hi @americast, please review the updated notebook. Below are several issues I am still facing now:
- It is not clear how to choose the frequency.
- It is not clear how to decide which model / parameters are better. Do we have any measurable / quantitative metrics we can offer after the training.
- Is the NeuralForecast training time and accuracy tunable? 28 minutes with neuralforecast vs 21 seconds with statsforecast is a huge gap.
- Even though, we fix the math domain error when there is only one data point, there are many 0 outputs, which does not make sense. I am using
WHERE price > 0
to filter them out now. - The date predicted under different
unique_id
differ a lot, some are 2017, while others are 2011. I think this is due to the fact that the next 3 step is based on the latest date from the training dataset, which can differ. I feel in reality, users want to predict at the same point in time.
@americast While you are fixing some of these issues, we could also discuss the plan for fixing here.
@americast While you are fixing some of these issues, we could also discuss the plan for fixing here.
Sure @jarulraj. Thanks @xzdandy for the review!
Hi @americast, please review the updated notebook. Below are several issues I am still facing now:
- It is not clear how to choose the frequency.
Yes, it can get a little confusing. I will send a separate PR for the frequency -related discussion.
- It is not clear how to decide which model / parameters are better. Do we have any measurable / quantitative metrics we can offer after the training.
We should add a metric for normalized RMSE or Interval Score. I shall take care of that in #1258
- Is the NeuralForecast training time and accuracy tunable? 28 minutes with neuralforecast vs 21 seconds with statsforecast is a huge gap.
It's not very linear. With larger datasets, statsforecast
might as well take a lot more time than neuralforecast
. The amount of time taken by neuralforecast
is kind of going to be linear corresponding to the number of unique ID
s. For statsforecast, it might grow non-linearly with more data.
- Even though, we fix the math domain error when there is only one data point, there are many 0 outputs, which does not make sense. I am using
WHERE price > 0
to filter them out now.
That's weird. Will check that. Anyway, forecasting with just one data point doesn't really make much sense. Perhaps we should also return some suggestion or warning?
- The date predicted under different
unique_id
differ a lot, some are 2017, while others are 2011. I think this is due to the fact that the next 3 step is based on the latest date from the training dataset, which can differ. I feel in reality, users want to predict at the same point in time.
This is an interesting problem. Perhaps we can ask for a time step range where the user wants forecast and predict at that step.
As of now, I am trying to come up with a confidence interval in forecasting, as well as a metric, that would better help analyze which method works the best. The entire setup could be a part of the feedback system. I'll be adding my commits in #1258. I'll update this doc with the metrics once that's merged.