transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Fixing Tensor Shape/Dimension Mismatch Errors in TimeSeries Transformer for Stock Price Prediction

Open sivavishnubramma opened this issue 1 year ago • 10 comments

System Info

I am training a timeseries transformer model to predict stock price changes based on the previous day price and other parameters. I have encountered a series of errors related to tensor shape, dimensions, and size. Now I have this below error:

File "\transformers\models\time_series_transformer\modeling_time_series_transformer.py", line 1378, in forward transformer_inputs, loc, scale, static_feat = self.create_network_inputs( File "\transformers\models\time_series_transformer\modeling_time_series_transformer.py", line 1303, in create_network_inputs raise ValueError: input length 11 and time feature lengths 13 does not match

My sample input dataset has 10 rows and 70 features. So the numbers 11 and 13 is not clear.

Prior to this error, I had several errors that are similar to this. Pasting it here for reference.

modeling_time_series_transformer.py", line 1272, in create_network_inputs (torch.cat((past_values, future_values), dim=1) - loc) / scale RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list. RuntimeError: Tensors must have same number of dimensions: got 3 and 2"

import pandas as pd
import numpy as np
from transformers import TimeSeriesTransformerModel, TimeSeriesTransformerConfig, Trainer, TrainingArguments, default_data_collator
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.metrics import mean_squared_error
import torch
from torch.utils.data import Dataset

# Load the CSV file
file_path = './spy-stock-price - Spy_Ind_Signal.csv'
data = pd.read_csv(file_path)

# Exclude specified columns
exclude_columns = ['20SMA', '50SMA', '200SMA', '20EMA', '10EMA', 'MACD', 'MACD_Signal', 'Average_Volume', 'Bollinger_High', 'Bollinger_Low', 'Bollinger_Middle', 'VWAP', 'AVWAP']
data = data.drop(columns=exclude_columns)

# Preprocess the data
data['Date'] = pd.to_datetime(data['Date'])
data = data.sort_values('Date')

# Encode categorical signals
data_transformed = data

# Drop the 'Date' column
data_transformed = data_transformed.drop(columns=['Date'])

# Normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data_transformed)

# Convert the data to a supervised learning problem
def create_dataset(data, look_back=1):
    X, Y = [], []
    for i in range(len(data) - look_back - 1):
        a = data[i:(i + look_back)]
        X.append(a)
        Y.append(data[i + look_back, -2:])  # Include the last two columns as targets
        if Y[-1] is None:
            print(f"NoneType found at index {i + look_back}")
    return np.array(X), np.array(Y)

look_back = 10  # Adjusted look_back to 10
X, y = create_dataset(scaled_data, look_back)

# Split into train and test sets
train_size = int(len(X) * 0.67)
X_train, X_test = X[0:train_size], X[train_size:]
y_train, y_test = y[0:train_size], y[train_size:]


# Reshape input to be [samples, time steps, features]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], data_transformed.shape[1]))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], data_transformed.shape[1]))

# Create observed mask for the transformer model
def create_observed_mask(data):
    mask = np.ones_like(data, dtype=np.float32)
    return mask

train_observed_mask = create_observed_mask(X_train)
test_observed_mask = create_observed_mask(X_test)

# Convert data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)
train_observed_mask = torch.tensor(train_observed_mask, dtype=torch.float32)
test_observed_mask = torch.tensor(test_observed_mask, dtype=torch.float32)

# Create a custom dataset class
class TimeSeriesDataset(Dataset):
    def __init__(self, X, y, observed_mask):
        self.X = X
        self.y = y
        self.observed_mask = observed_mask

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        time_dim = self.X.shape[1]
        feature_dim = self.X.shape[2]

        future_values = self.y[idx].unsqueeze(0).repeat(time_dim, feature_dim // 2)
        static_categorical_features = torch.tensor([]).unsqueeze(0).repeat(time_dim, feature_dim)
        static_real_features = torch.zeros((time_dim, feature_dim))
        static_feat = torch.zeros((time_dim, feature_dim, 1))

        sample = {
            'past_values': self.X[idx],
            'past_time_features': torch.zeros((time_dim, feature_dim)), 
            'past_observed_mask': self.observed_mask[idx],
            'future_values': future_values,
            'future_time_features': future_values,
        }

        return sample

train_dataset = TimeSeriesDataset(X_train, y_train, train_observed_mask)
test_dataset = TimeSeriesDataset(X_test, y_test, test_observed_mask)

# Model configuration
config = TimeSeriesTransformerConfig(
    prediction_length=1,
    context_length=look_back,
    lags_seq=[1, 2, 3],
    input_size=data_transformed.shape[1],
    output_size=2,  # Predicting both price_change and bull_bear_signal
    num_time_features=data_transformed.shape[1],  # Match the input size
    num_static_categorical_features=0,
    num_static_real_features=0,
    cardinality=[],
    embedding_dimension=[]
)

model = TimeSeriesTransformerModel(config)

# Training configuration
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    learning_rate=1e-4,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=10,
    weight_decay=0.01,
    logging_dir="./logs",
)

# Training
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=default_data_collator,
)

trainer.train()

# Evaluation
predictions, labels, _ = trainer.predict(test_dataset)

# Inverse transform the predictions and labels
predictions = scaler.inverse_transform(predictions)
labels = scaler.inverse_transform(y_test.numpy())

# Separate the predictions and labels for price_change and bull_bear_signal
predictions_price_change = predictions[:, 0]
predictions_bull_bear_signal = predictions[:, 1]
labels_price_change = labels[:, 0]
labels_bull_bear_signal = labels[:, 1]

# Calculate the Mean Squared Error for price_change
mse_price_change = mean_squared_error(labels_price_change, predictions_price_change)
print(f"Mean Squared Error for Price Change: {mse_price_change}")

# For bull_bear_signal, we can use accuracy as the metric
accuracy_bull_bear_signal = np.mean(predictions_bull_bear_signal.round() == labels_bull_bear_signal.round())
print(f"Accuracy for Bull Bear Signal: {accuracy_bull_bear_signal}")

Steps taken so far:

Upgraded to the latest version of transformer model 4.42. Searched huggingface and stackoverflow forum to find a resolution. While each suggestion helps to address the current error, a slight variant of the same error pops up. Fixed several errors priors to the current error. Made sure all the tensors (past_values, past_time_features, future_values etc ) have the same shape. How to resolve the issue?

Who can help?

@ArthurZucker, @muellerzr @gante

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

  1. Import the required libraries
  2. create a csv file ‘./spy-stock-price - Spy_Ind_Signal.csv'
  3. fill csv with following columns (Date, Price, Volume, Price_change, bull_bear_signal.
  4. Execute the python script.

Expected behavior

The model should complete training and evaluation and provide accuracy of the predictions and mean square error.

sivavishnubramma avatar Jun 23 '24 06:06 sivavishnubramma

cc @kashif

amyeroberts avatar Jun 23 '24 19:06 amyeroberts

thanks! having a look

kashif avatar Jun 23 '24 19:06 kashif

Appreciate it. Please let me know for any additional information that can help with the debugging.

sivavishnubramma avatar Jun 23 '24 21:06 sivavishnubramma

If I understand correctly, the issue is in this code snippet:

sample = {
   'past_values': self.X[idx],
   'past_time_features': torch.zeros((time_dim, feature_dim)), 
   'past_observed_mask': self.observed_mask[idx],
   'future_values': future_values,
   'future_time_features': future_values,
}

Appreciate your timely support!

sivavishnubramma avatar Jun 24 '24 08:06 sivavishnubramma

@sivavishnubramma indeed the issue is in the sizes of the inputs... so what are the shapes of "past_values" and "future_values" tensors?

kashif avatar Jun 24 '24 08:06 kashif

also note that the future_time_features are the known covariates int he prediction window (i.e. the known date-time covariates for example or any static covariates)

kashif avatar Jun 24 '24 08:06 kashif

I added a few time variant to the code. Here are the shapes of the parameter "

Sample 523 - past_values shape: torch.Size([10, 74]), past_time_features shape: torch.Size([10, 74]), past_observed_mask shape: torch.Size([10, 74]), future_values shape: torch.Size([10, 74]) Sample 186 - past_values shape: torch.Size([10, 74]), past_time_features shape: torch.Size([10, 74]), past_observed_mask shape: torch.Size([10, 74]), future_values shape: torch.Size([10, 74]) Sample 441 - past_values shape: torch.Size([10, 74]), past_time_features shape: torch.Size([10, 74]), past_observed_mask shape: torch.Size([10, 74]), future_values shape: torch.Size([10, 74])

fyi- Prior to adding the time variants, the shape was [10, 70]

sivavishnubramma avatar Jun 24 '24 09:06 sivavishnubramma

The error messages are somewhat challenging for someone new to this area to decode. I have a more specific question regarding dataset construction for the Time Series Transformed Model training:

  1. I have a dataset from a CSV file with columns: {Date, Open, Close, Volume, Price_change, Bull_Bear_Signal}.
  2. I've added the following time-related features: {'day_of_week', 'month_of_year'}, increasing the total feature count to 8, including the Date.
  3. I then removed the Date column, leaving 7 features in the dataset.
  4. The look-back period is set to 10, and the prediction length is 1.
  5. The model outputs 2 predictions: {Price_change, Bull_Bear_Signal}.

Could you please guide me on how to correctly structure the following dataset to avoid previous errors and ensure successful model training? (OR) If there is a builder/adapter class that gets the user requirement similar to the above list and produces the dataset necessary to train the model, please let me know.

sample = {
    'past_values': <>,
    'past_time_features': <>,
    'past_observed_mask': <>,
    'future_values': <>,
    'future_time_features': <>
}

Let me know for any additional details.

sivavishnubramma avatar Jun 25 '24 01:06 sivavishnubramma

@sivavishnubramma we do have a blog post explaining the inputs and what the batch samples should be here: https://huggingface.co/blog/time-series-transformers

did you manage to have a read of the blog post?

kashif avatar Jun 25 '24 10:06 kashif

Finally I was able to execute the script and run some evals. I got MASE around 2.3 and sMAPE around 0.16. Have to dig deeper to under and improve these numbers. Just want to share my appreciation to the huggingface team.

Later I will share some documentation that will help future users.

@kashif - if you have any blogpost that talks about time series model optimization, please share it.

sivavishnubramma avatar Jun 30 '24 01:06 sivavishnubramma

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jul 24 '24 08:07 github-actions[bot]