ForkingPickler Error When Reading Arrow File
Describe the bug TypeError caused by EOFError when loading pickle file through ForkingPickler:
2024-07-10 10:46:29,886 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - TF32 format is only available on devices with compute capability >= 8. Setting tf32 to False.
2024-07-10 10:46:29,893 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Using SEED: 1360904892
2024-07-10 10:46:29,958 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Logging dir: output\run-1
2024-07-10 10:46:29,961 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Loading and filtering 1 datasets for training: ['data.arrow']
2024-07-10 10:46:29,962 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Mixing probabilities: [1]
2024-07-10 10:46:30,642 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Initializing model
2024-07-10 10:46:30,642 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Using pretrained initialization from amazon/chronos-t5-small
The speedups for torchdynamo mostly come wih GPU Ampere or higher and which is not detected here.
max_steps is given, it will override any value given in num_train_epochs
2024-07-10 10:46:45,054 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Training
0%| | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py", line 692, in
(chronos) C:\Users\alvin\OneDrive\Coding\Python\chronos>Traceback (most recent call last):
File "
Occurs when attempting to fine tune the model: python chronos-forecasting/scripts/training/train.py --config chronos-forecasting/scripts/training/configs/chronos-t5-small.yaml --model-id amazon/chronos-t5-small --no-random-init --max-steps 1000 --learning-rate 0.001
Steps taken:
- Spun up new conda environment
- pip install "chronos[training] @ git+https://github.com/amazon-science/chronos-forecasting.git"
- Cloned repo into working directory
- Converted pandas df into arrow with provided function: convert_to_arrow('data.arrow', df.VALUE, df.REF_DATE)
- Edited config file to point to arrow file
- set CUDA_VISIBLE_DEVICES=0
- python chronos-forecasting/scripts/training/train.py --config chronos-forecasting/scripts/training/configs/chronos-t5-small.yaml --model-id amazon/chronos-t5-small --no-random-init --max-steps 1000 --learning-rate 0.001
- Encountered RuntimeError:
fused=Truerequires all the params to be floating point Tensors of supported devices: ['cuda', 'xpu', 'privateuseone'], so I did pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 - Ran training again: python chronos-forecasting/scripts/training/train.py --config chronos-forecasting/scripts/training/configs/chronos-t5-small.yaml --model-id amazon/chronos-t5-small --no-random-init --max-steps 1000 --learning-rate 0.001
Environment description Operating system: Python version: 3.10.14 CUDA version: 12.2 PyTorch version: 2.3.1+cu121 HuggingFace transformers version: 4.42.3 HuggingFace accelerate version: 0.32.1
Any help is appreciated, I have tried this with multiple fresh conda environments on different machines
It looks like you're using Windows. We haven't really tested this codebase on windows. Could you try the following?
- Set
dataloader_num_workersto 0. - Use another optimizer instead of
adamw_torch_fused(tryadamw_torch).
@AlvinLok any update on this?
Yes, I'm on Windows. I've made the changes, and now it's a new error:
Traceback (most recent call last):
File "C:\Users\alvinlok\xxx\03 Code\chronos-forecasting\scripts\training[train.py](http://train.py/)", line 694, in
For this, convert_to_arrow('data.arrow', df.VALUE, df.REF_DATE), can you share how your dataframe looks like?
This is what my df looks like:
REF_DATE VALUE
0 2010-01-01 84.7 1 2010-02-01 85.3 2 2010-03-01 85.4 3 2010-04-01 85.8 4 2010-05-01 86.8
convert_to_arrow( path="arrow_files/p32_df_train.arrow", time_series=p32_df_train.VALUE, start_times=p32_df_train.REF_DATE, )
@AlvinLok could you check if the fix proposed in #156 makes it work for you?
no, adding freeze_support() did not have any effect. I am getting the same error: Array 'target' has bad shape - expected 1 dimensions, got 0.
@lostella this one is unrelated. @AlvinLok you're transforming the data incorrectly. Please check the type signature of the function that you're using to transform. convert_to_arrow expects
...
time_series: Union[List[np.ndarray], np.ndarray],
start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
...
The first one is a list of 1-D numpy arrays (i.e., a list of time series). The second one is a list of np.datetime64, i.e., a list of start times, one for each time series in the first list. Since we're only using the start_times, time series are expected to be uniformly-spaced.
@AvisP: can you also check that you're transforming the data correctly?
@abdulfatir That is highly unlinkely, as I am not using any custom data but generated ones using the provided script and code in example. Are there any likely issues that may happen generating data with python kernel-synth.py --num-series 20 --max-kernels 5 and with the following script? Here are the datafiles that I am using to download and verify
from pathlib import Path
from typing import List, Optional, Union
import numpy as np
from gluonts.dataset.arrow import ArrowWriter
def convert_to_arrow(
path: Union[str, Path],
time_series: Union[List[np.ndarray], np.ndarray],
start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
compression: str = "lz4",
):
if start_times is None:
# Set an arbitrary start time
start_times = [np.datetime64("2000-01-01 00:00", "s")] * len(time_series)
assert len(time_series) == len(start_times)
dataset = [
{"start": start, "target": ts} for ts, start in zip(time_series, start_times)
]
ArrowWriter(compression=compression).write_to_file(
dataset,
path=path,
)
if __name__ == "__main__":
# Generate 20 random time series of length 1024
time_series = [np.random.randn(1024) for i in range(20)]
# Convert to GluonTS arrow format
convert_to_arrow("./noise-data.arrow", time_series=time_series)
@lostella this one is unrelated. @AlvinLok you're transforming the data incorrectly. Please check the type signature of the function that you're using to transform.
convert_to_arrowexpects... time_series: Union[List[np.ndarray], np.ndarray], start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None, ...The first one is a list of 1-D numpy arrays (i.e., a list of time series). The second one is a list of
np.datetime64, i.e., a list of start times, one for each time series in the first list. Since we're only using thestart_times, time series are expected to be uniformly-spaced.
Alright, well I converted it to a numpy array, and removed the start times argument but received the same error
time_series_data = p32_df_train.VALUE.to_numpy() path = "arrow_files/p32_df_train.arrow"
convert_to_arrow( path=path, time_series=time_series_data )
Error:
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\dataset\common.py", line 345, in call raise GluonTSDataError( gluonts.exceptions.GluonTSDataError: Array 'target' has bad shape - expected 1 dimensions, got 0. 0%| | 0/1000 [00:00<?, ?it/s]
@AlvinLok It looks like you're passing a single series to the function. You need to pass a list of time series. If you only have a single series, pass it as [time_series_data].
Closing due to inactivity. Please feel free to re-open if you have further questions.