Training is extremely slow on Gluonts [Torch]

Open khawar-islam opened this issue 2 years ago • 1 comments

Description

I am quite frustrating because I am training a model and training is very very slow on RTX 3080. I am training on 500 CSV files. If anyone can help me for this,

To Reproduce

Define the DeepAR estimator

estimator = DeepAREstimator(
    prediction_length=12,  # Adjust based on how far you want to predict
    context_length=24,  # Context length should be at least as long as prediction length
    freq="1min",  # Change to your data's frequency
    batch_size=64,
    trainer_kwargs={"max_epochs": 1, "accelerator": "gpu"}
).train(training_data, )
predictor = estimator.train(training_data=training_data)

# Load your dataset
# Base directory where the folders are located
base_dir = '/media/cvpr/CM_1/coremax_cpu_usage/coremax_cpu/rnd'

# List of folder names
folders = ['2013-7', '2013-8', '2013-9']

# Initialize an empty DataFrame to store all data
all_data = pd.DataFrame()

# Iterate over each folder and read each file
for folder in folders:
    folder_path = os.path.join(base_dir, folder)
    for file in os.listdir(folder_path):
        if file.endswith('.csv'):
            file_path = os.path.join(folder_path, file)
            temp_df = pd.read_csv(file_path, delimiter=';')
            temp_df.columns = temp_df.columns.str.strip()  # Strip whitespace from column names here
            all_data = pd.concat([all_data, temp_df], ignore_index=True)

print(all_data)

# Convert timestamp to datetime and set it as the index
all_data['Timestamp'] = pd.to_datetime(all_data['Timestamp [ms]'], unit='ms')
all_data.set_index('Timestamp', inplace=True)

# Prepare the dataset for GluonTS
training_data = ListDataset([{
    "start": all_data.index[0],
    "target": all_data['CPU usage [MHZ]'].values,
    "feat_dynamic_real": all_data[
        ['CPU cores', 'Memory usage [KB]', 'Disk read throughput [KB/s]', 'Disk write throughput [KB/s]',
         'Network received throughput [KB/s]', 'Network transmitted throughput [KB/s]']].values.T
}], freq="1min")  # Change '5min' to the actual frequency of your data

# Define the DeepAR estimator
estimator = DeepAREstimator(
    prediction_length=12,  # Adjust based on how far you want to predict
    context_length=24,  # Context length should be at least as long as prediction length
    freq="1min",  # Change to your data's frequency
    batch_size=64,
    trainer_kwargs={"max_epochs": 1, "accelerator": "gpu"}
).train(training_data, )```

## Error message or code output
(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

Epoch 0: | | 3/? [08:49<00:00, 0.01it/s, v_num=22]


## Environment
- Operating system: 20.04
- Python version: 3.8.18
- GluonTS version: 0.14.3
- MXNet version: Using torch

(Add as much information about your environment as possible, e.g. dependencies versions.)

Dec 13 '23 01:12 khawar-islam

@khawar-islam what is the performance when running on CPU?

I'm not sure you can expect great performance with a DeepAR model (at least with default hyperparameters) since it's based on a recurrent neural network: this makes the model operations non-parallelizable, hence the GPU utilization will be extremely low.

Dec 15 '23 09:12 lostella