neuralforecast predict_insample step

Fixes https://github.com/Nixtla/neuralforecast/issues/866

Solution

predict_insample always sets step_size=1 at the beginning of the function but stores the value set by the user.
the troubling condition will always be satisfied
at the end of the current function, filter the extra cutoff dates of the fcsts_df dataframe considering the initial step_size requested by user, so if we have h=3 and step_size=2 we'll want indices 0, 1, 2, 6, 7, 8, 12, 13, 14 (first cutoff, third, cutoff, fifth cutoff, etc)

Jan 27 '24 15:01 isaac-chung

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Jan 27 '24 15:01 review-notebook-app[bot]

All committers have signed the CLA.

Jan 27 '24 15:01 CLAassistant

@quest-bot loot #866

Jan 27 '24 15:01 isaac-chung

Quest PR submitted!

@isaac-chung You are attempting to solve the issue and loot this Quest. Will you be successful?

Questions? Check out the docs.

Jan 27 '24 15:01 quest-bot[bot]

@AzulGarza Yes that makes sense, thanks for the feedback. I've updated the PR description and pushed a new commit that reflect the changes requested.

Jan 30 '24 09:01 isaac-chung

Hey @isaac-chung, the series may have different starts and ends (thus different cutoffs), so the current solution won't work. We have the "advantage" that there's an issue at the moment where all the series need to have the same length for predict_insample to work, so we can make use of that to generate the indices of the rows that we want to keep. The result of the predict is sorted by serie, so if we have h=3 and step_size=2 we'll want indices 0, 1, 2, 6, 7, 8, 12, 13, 14 (first cutoff, third, cutoff, fifth cutoff, etc), which we can generate with the following:

n_series = len(self.uids)
n_windows = fcsts_df.shape[0] // (n_series * h)
offsets = np.repeat(np.arange(0, n_series * n_windows, step_size), h) * h
horizons = np.tile(np.arange(h), offsets.size // h)
keep_rows = offsets + horizons

and then use these to slice the fcsts_df. Keep in mind that this has to work for both pandas and polars, so we have a function to do this:

fcsts_df = ufp.take_rows(fcsts_df, keep_rows)

Please do this slicing after the horizontal concat https://github.com/Nixtla/neuralforecast/blob/0c2d196c08ff592a3203bae059e422cc6e63127c/neuralforecast/core.py#L921 to reduce the size of the following join.

Jan 30 '24 21:01 jmoralez

Also please add a test to verify that the cutoffs that we get are the expected ones, e.g. step_size = 2 should skip one period in every cutoff, step_size = 3 should skip two periods, etc.

Jan 30 '24 21:01 jmoralez

Thanks @jmoralez . For this test, do you have a preference on where it lives? e.g. a new cell in nbs/core.ipynb?

Jan 31 '24 20:01 isaac-chung

There's a cell with the comment # Test predict_insample in nbs/core.ipynb, please add the test in that cell below the current ones with the comment # test step_size or similar.

Jan 31 '24 20:01 jmoralez

@jmoralez thanks again. See if this aligns with what we had discussed above.

Jan 31 '24 21:01 isaac-chung

Are you not able to run the test because of this?

Also a better test would be checking the cutoffs explicitly, something like this:

train_start = AirPassengersPanel_train['ds'].min()
first_cutoff = train_start - pd.offsets.MonthEnd()
sizes = AirPassengersPanel_train['unique_id'].value_counts().to_numpy()
for step_size in range(1, 4):
    n_expected_cutoffs = sizes[0] // step_size
    expected_cutoffs = np.array([first_cutoffs + step_size * i * pd.offsets.MonthEnd() for i in range(n_expected_cutoffs)])
    np.testing.assert_array_equal(forecasts['cutoff'].unique(), expected_cutoffs)

Feb 01 '24 17:02 jmoralez

I wasn't able to run it before as my system has more than 1 GPU. I ran export CUDA_VISIBLE_DEVICES=0 and it seems to have been resolved.

Running the test in the suggestion fails for all step sizes. It seems like forecasts['cutoff'].unique() was missing an extra year (the last year, 1959 in the test). Does that sound expected? I am trying to understand if expected_cutoffs got another year or if forecasts missed a year.

Feb 01 '24 20:02 isaac-chung

Probably the cutoffs got an extra year. The last cutoff should be one year before (because h=12 and monthly freq) the last date in the train dataset. Which makes me think we may also need to generate the indices starting from the back, because we may truncate in some situations and leave that one out.

Feb 01 '24 20:02 jmoralez

Makes sense. I updated the test to not include the last h periods. What do you mean by "generate the indices starting from the back"?

Feb 02 '24 09:02 isaac-chung

We always want to have the last cutoff, so if h=2, step_size=3 and the length is 7 we currently compute n_windows=length // h = 7 // 2 = 3 and then we generate offsets=[0, 0], horizons=[0, 1], keep_rows=[0, 1], so we're only keeping the first offset and we would like to keep the last instead. So we should modify the indices generation to account for this.

Feb 02 '24 16:02 jmoralez

I'm confused as to 1) whether the last cutoff is needed in addition to the first one, and 2) what we want to keep in that example, where n_window = 3, h=2, step_size=3 and length = 7:

would the forecasts span [0,1], [3,4], [6,7] (?)

Also just to make sure that I follow what the method is supposed to be doing, I'm reading the diagram in nbs/examples/PredictInsample.ipynb, where h=4 and step_size=2. It seems to contradict your earlier example:

so if we have h=3 and step_size=2 we'll want indices 0, 1, 2, 6, 7, 8, 12, 13, 14 (first cutoff, third, cutoff, fifth cutoff, etc)

Wouldn't we want [0,1,2], [2,3,4], [4,5,6] etc?

Feb 02 '24 20:02 isaac-chung

whether the last cutoff is needed in addition to the first one

If there's only one possible cutoff (because there are 3 for example and the step size is 3) we want the last one.

what we want to keep in that example, where n_window = 3, h=2, step_size=3 and length = 7: would the forecasts span [0,1], [3,4], [6,7] (?)

In this case I believe the full in-sample predictions (with step_size=1) would look like this:

cutoff	ds
0	1
0	2
1	2
1	3
2	3
2	4
3	4
3	5
4	5
4	6
5	6
5	7

In this case with the current index generation I think we'd take rows: [0, 1, 6, 7], corresponding to cutoffs 0 and 3 and we'd loose the last cutoff (5). In this case i think we should instead take rows: [4, 5, 10, 11] which would correspond to cutoffs 2 and 5. @cchallu do you agree with this?

Feb 02 '24 21:02 jmoralez

Yes, we always want to keep the last cutoff. This is the predict_insample, so users do not specify the n_windows parameter. We should always return the last cutoff, and consider the step_size backwards.

Feb 05 '24 16:02 cchallu

@isaac-chung can you please fix the failing test?

Feb 14 '24 21:02 jmoralez

@jmoralez will do first thing when I'm back online in 2 weeks! 👍

Feb 16 '24 12:02 isaac-chung

@jmoralez I found why the test is failing. The actual cutoffs took into consideration of nf.uids (n_series = 2). Fix is in the newest commit.

Mar 05 '24 09:03 isaac-chung

Please also add the following check to the tests (which currently fails):

cutoffs_by_serie = forecasts.groupby(['unique_id', 'cutoff']).size().unstack('unique_id')
pd.testing.assert_series_equal(cutoffs_by_serie['Airline1'], cutoffs_by_serie['Airline2'])

Since the series lengths will be the same you should be able to generate the offsets once, tile them and add the n_windows * series_idx or similar.

Mar 08 '24 17:03 jmoralez

Apologies - I no longer have time in the next 3-4 weeks to work on this. Feel free to reassign this to someone else. Thanks for the support thus far.

Mar 09 '24 01:03 isaac-chung

neuralforecast
neuralforecast copied to clipboard

predict_insample step_size fix

Quest PR submitted!

neuralforecast neuralforecast copied to clipboard

predict_insample step_size fix

Quest PR submitted!

neuralforecast
neuralforecast copied to clipboard