scalecast icon indicating copy to clipboard operation
scalecast copied to clipboard

VECM

Open michellebaugraczyk opened this issue 3 years ago • 3 comments

Hi Michael,

VECM has been showing a frequency error when you have gaps in your data. I would like to know if it's possible to correct that for all type of frequency even if we have gaps in the data like other models in scalecast that works well for this cases.

Best regards,

Michelle

michellebaugraczyk avatar Sep 21 '22 14:09 michellebaugraczyk

If you have missing values in the data, it is most likely a statsmodels native issue: https://github.com/statsmodels/statsmodels/issues/3534

Just in case, I will change how the freq argument is specified in the vecm model to see if that fixes the issue and that will be implemented in 0.14.4, planned for implementation on 9/23/22.

mikekeith52 avatar Sep 21 '22 16:09 mikekeith52

Please test the model from 0.14.4 to see if you have the same issue. Thanks, as always, for raising the issue!

mikekeith52 avatar Sep 23 '22 14:09 mikekeith52

I happened to run an example recently where I was able to reproduce this error. I'm seeing that it is most likely from using business-day data. Sometimes business days from various data sources don't line up with the business day definition used by pandas. To fix that, you can use df = df.asfreq('B', method='ffill'). Replace 'ffill' with the na-fill method of your choice in case nulls are introduced in this process. Make sure the dataframe's index is the datetime column. Here is an example where this would work:

import pandas_datareader as pdr
from scalecast.Forecaster import Forecaster
from scalecast.MVForecaster import MVForecaster

FANG = [
    'META',
    'AMZN',
    'NFLX',
    'GOOG',
]

fs = []
for sym in FANG:
    df = pdr.get_data_yahoo(sym)
    df = df.asfreq('B', method='ffill') # fixes the issue
    f = Forecaster(
        y=df['Close'],
        current_dates = df.index,
        future_dates = 65,
        end = '2022-09-30',
    )
    fs.append(f)
    
mvf = MVForecaster(*fs,names=FANG)

I think this is something that users will have to do in pandas before loading to a scalecast object, as I don't know how this could be implemented into the package.

mikekeith52 avatar Oct 06 '22 14:10 mikekeith52