evadb icon indicating copy to clipboard operation
evadb copied to clipboard

Improve the 16-homesale-forecasting.ipynb to give details on how to tune every configuration.

Open xzdandy opened this issue 1 year ago • 4 comments

Update the notebook with neuralforecast and prediction for every postcode after the math domain error get fixed in #1283

xzdandy avatar Oct 12 '23 06:10 xzdandy

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Hi @americast, please review the updated notebook. Below are several issues I am still facing now:

  1. It is not clear how to choose the frequency.
  2. It is not clear how to decide which model / parameters are better. Do we have any measurable / quantitative metrics we can offer after the training.
  3. Is the NeuralForecast training time and accuracy tunable? 28 minutes with neuralforecast vs 21 seconds with statsforecast is a huge gap.
  4. Even though, we fix the math domain error when there is only one data point, there are many 0 outputs, which does not make sense. I am using WHERE price > 0 to filter them out now.
  5. The date predicted under different unique_id differ a lot, some are 2017, while others are 2011. I think this is due to the fact that the next 3 step is based on the latest date from the training dataset, which can differ. I feel in reality, users want to predict at the same point in time.

xzdandy avatar Oct 23 '23 05:10 xzdandy

@americast While you are fixing some of these issues, we could also discuss the plan for fixing here.

jarulraj avatar Oct 24 '23 02:10 jarulraj

@americast While you are fixing some of these issues, we could also discuss the plan for fixing here.

Sure @jarulraj. Thanks @xzdandy for the review!

Hi @americast, please review the updated notebook. Below are several issues I am still facing now:

  1. It is not clear how to choose the frequency.

Yes, it can get a little confusing. I will send a separate PR for the frequency -related discussion.

  1. It is not clear how to decide which model / parameters are better. Do we have any measurable / quantitative metrics we can offer after the training.

We should add a metric for normalized RMSE or Interval Score. I shall take care of that in #1258

  1. Is the NeuralForecast training time and accuracy tunable? 28 minutes with neuralforecast vs 21 seconds with statsforecast is a huge gap.

It's not very linear. With larger datasets, statsforecast might as well take a lot more time than neuralforecast. The amount of time taken by neuralforecast is kind of going to be linear corresponding to the number of unique IDs. For statsforecast, it might grow non-linearly with more data.

  1. Even though, we fix the math domain error when there is only one data point, there are many 0 outputs, which does not make sense. I am using WHERE price > 0 to filter them out now.

That's weird. Will check that. Anyway, forecasting with just one data point doesn't really make much sense. Perhaps we should also return some suggestion or warning?

  1. The date predicted under different unique_id differ a lot, some are 2017, while others are 2011. I think this is due to the fact that the next 3 step is based on the latest date from the training dataset, which can differ. I feel in reality, users want to predict at the same point in time.

This is an interesting problem. Perhaps we can ask for a time step range where the user wants forecast and predict at that step.

As of now, I am trying to come up with a confidence interval in forecasting, as well as a metric, that would better help analyze which method works the best. The entire setup could be a part of the feedback system. I'll be adding my commits in #1258. I'll update this doc with the metrics once that's merged.

americast avatar Oct 24 '23 04:10 americast