mlforecast icon indicating copy to clipboard operation
mlforecast copied to clipboard

[core] uids do not update after transfer learning

Open anthonygiorgio97 opened this issue 1 year ago • 2 comments

What happened + What you expected to happen

Following the guide in the documentation: Transfer Learning with MLForecast, I encountered an issue with the ts object when performing transfer learning.

Issue While the local target scaler is correctly updated for the new unique_id in the new dataframe (as seen in fcst.ts.target_transforms[0].scaler_.stats_), the same does not happen for fcst.ts.uids.

In the provided example, after fitting the model on the M3 dataset and then applying it to the M4 dataset, fcst.ts.uids still contains the unique_id values from M3, instead of the updated values from M4.

import lightgbm as lgb
import numpy as np
import pandas as pd
from datasetsforecast.m3 import M3
from datasetsforecast.m4 import M4

from mlforecast import MLForecast
from mlforecast.target_transforms import *

Y_df_M3, _, _ = M3.load(directory='./', group='Monthly')
Y_df_M4, _, _ = M4.load(directory='./', group='Monthly')
Y_df_M4['ds'] = pd.to_datetime(Y_df_M4['ds'])

models = [lgb.LGBMRegressor(verbosity=-1)]
fcst = MLForecast(
    models=models, 
    lags=range(1, 13),
    freq='MS',
    target_transforms=[LocalStandardScaler()],
)
fcst.fit(Y_df_M3);

print('total M3 unique_id: ', len(Y_df_M3['unique_id'].unique()))
print('total uids before transfer learning: ', len(fcst.ts.uids))
print('scaler len before transfer learning: ', len(fcst.ts.target_transforms[0].scaler_.stats_))

Y_hat_df = fcst.predict(h=12, new_df=Y_df_M4)

print('total M4 unique_id: ', len(Y_df_M4['unique_id'].unique()))
print('total uids after transfer learning: ', len(fcst.ts.uids))
print('scaler len after transfer learning: ', len(fcst.ts.target_transforms[0].scaler_.stats_))
total M3 unique_id:  1428
total uids before transfer learning:  1428
scaler len before transfer learning:  1428
total M4 unique_id:  48000
total uids after transfer learning:  1428
scaler len after transfer learning:  48000

It would be useful to have fcst.ts.uids updated to reflect the new unique_id values. This is particularly important for correctly retrieving scaler values when performing an inverse transform on SHAP values for the new predictions, as shown below:

# Create dictionary for stdscaler
scaler_dict = {
    unique_id: [
        scaler_stats[0],  # Mean
        scaler_stats[1],  # Std deviation
    ]
    for unique_id, scaler_stats in zip(
        fcst.ts.uids,  # Still contains M3 unique_ids
        fcst.ts.target_transforms[0].scaler_.stats_,
    )
}

Versions / Dependencies

mlforecast==1.0.2

Reproduction script

import lightgbm as lgb
import numpy as np
import pandas as pd
from datasetsforecast.m3 import M3
from datasetsforecast.m4 import M4

from mlforecast import MLForecast
from mlforecast.target_transforms import *

Y_df_M3, _, _ = M3.load(directory='./', group='Monthly')
Y_df_M4, _, _ = M4.load(directory='./', group='Monthly')
Y_df_M4['ds'] = pd.to_datetime(Y_df_M4['ds'])

models = [lgb.LGBMRegressor(verbosity=-1)]
fcst = MLForecast(
    models=models, 
    lags=range(1, 13),
    freq='MS',
    target_transforms=[LocalStandardScaler()],
)
fcst.fit(Y_df_M3);

print('total M3 unique_id: ', len(Y_df_M3['unique_id'].unique()))
print('total uids before transfer learning: ', len(fcst.ts.uids))
print('scaler len before transfer learning: ', len(fcst.ts.target_transforms[0].scaler_.stats_))

Y_hat_df = fcst.predict(h=12, new_df=Y_df_M4)

print('total M4 unique_id: ', len(Y_df_M4['unique_id'].unique()))
print('total uids after transfer learning: ', len(fcst.ts.uids))
print('scaler len after transfer learning: ', len(fcst.ts.target_transforms[0].scaler_.stats_))

Issue Severity

High: It blocks me from completing my task.

anthonygiorgio97 avatar Feb 25 '25 11:02 anthonygiorgio97