deepdow icon indicating copy to clipboard operation
deepdow copied to clipboard

Increase in Columns

Open isaactyj opened this issue 2 years ago • 2 comments

After running raw_to_Xy to take in multi indexed data, how do I amend the code in the getting started Python notebook such that it now takes into account the change in matrix size? Sorry, I am still trying to figure it out and is quite new to this

isaactyj avatar Sep 19 '23 16:09 isaactyj

Hey @isaactyj

Do you think you could share a minimal reproducible example of your issue? Without leaking any private information - feel free to to create a minimal dataset and upload it if necessary.

jankrepl avatar Sep 20 '23 17:09 jankrepl

The data I use is just from Yahoo Finance, so i can just upload the code to get the multi-index csv

stocks = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA']

start_date = '2013-01-01'

end_date = '2019-01-01'

data = yf.download("AAPL", start=start_date, end=end_date)

data = data.reset_index()

columns_multi_index = pd.MultiIndex.from_product([stocks, ['Close', "Volume"]], names=['Stock', 'Metric'])

all_data = pd.DataFrame(columns=columns_multi_index, index=dates)

for i in stocks:
    print(i)
    data = yf.download(i, start=start_date, end=end_date)
    close = data['Close']
    volume = data['Volume']
    all_data[(i, 'Close')] = close.tolist()
    all_data[(i, 'Volume')] = volume.tolist()

n_timesteps = len(all_data)  # 20
n_channels = len(all_data.columns.levels[0])  # 2
n_assets = len(all_data.columns.levels[1])  # 2
lookback, gap, horizon = 5, 2, 4
n_samples =  n_timesteps - lookback - horizon - gap + 1  # 10
X, timestamps, y, asset_names, indicators = raw_to_Xy(all_data,
                                                      lookback=lookback,
                                                      gap=gap,
                                                      freq="B",
                                                      horizon=horizon,
                                                      use_log=True)
 assert X.shape == (n_samples, n_channels, lookback, n_assets) 
 assert timestamps[0] == all_data.index[lookback]             

I instantly get an assertion error for both assertions. I am actually using 18 stocks but for simplicity ill just change it to 5 stocks

isaactyj avatar Sep 21 '23 02:09 isaactyj