[BUG] KeyError: 'start'?
The dataset name is "train-00000-of-00001.parquet".
I am very confused about this question. When I run the "scripts/training/train.py", the error as follows:
/home/miniconda3/envs/chronos/lib/python3.10/site-packages/gluonts/json.py:102: UserWarning: Using `json`-module for json-handling. Consider installing one of `orjson`, `ujson` to speed up serialization and deserialization.
warnings.warn(
2024-09-08 17:04:31,949 - /mnt/e/python/chronos-forecasting/scripts/training/train.py - INFO - Using SEED: 1837345163
2024-09-08 17:04:31,956 - /mnt/e/python/chronos-forecasting/scripts/training/train.py - INFO - Logging dir: output/run-9
2024-09-08 17:04:31,956 - /mnt/e/python/chronos-forecasting/scripts/training/train.py - INFO - Loading and filtering 1 datasets for training: ['/mnt/e/python/chronos-forecasting/data/train-00000-of-00001.parquet']
2024-09-08 17:04:31,956 - /mnt/e/python/chronos-forecasting/scripts/training/train.py - INFO - Mixing probabilities: [1.0]
2024-09-08 17:04:31,964 - /mnt/e/python/chronos-forecasting/scripts/training/train.py - INFO - Initializing model
2024-09-08 17:04:31,964 - /mnt/e/python/chronos-forecasting/scripts/training/train.py - INFO - Using pretrained initialization from google/t5-efficient-tiny
/home/miniconda3/envs/chronos/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
max_steps is given, it will override any value given in num_train_epochs
2024-09-08 17:04:33,966 - /mnt/e/python/chronos-forecasting/scripts/training/train.py - INFO - Training
0%| | 0/200000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/mnt/e/python/chronos-forecasting/scripts/training/train.py", line 702, in <module>
app()
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/typer/main.py", line 338, in __call__
raise e
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/typer/main.py", line 321, in __call__
return get_command(self)(*args, **kwargs)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/typer/core.py", line 665, in main
return _main(
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/typer/core.py", line 197, in _main
rv = self.invoke(ctx)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/typer/main.py", line 703, in wrapper
return callback(**use_params)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/typer_config/decorators.py", line 92, in wrapped
return cmd(*args, **kwargs)
File "/mnt/e/python/chronos-forecasting/scripts/training/train.py", line 689, in main
trainer.train()
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
return inner_training_loop(
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/transformers/trainer.py", line 2236, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/accelerate/data_loader.py", line 677, in __iter__
next_batch, next_batch_info = self._fetch_batches(main_iterator)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/accelerate/data_loader.py", line 631, in _fetch_batches
batches.append(next(iterator))
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
File "/mnt/e/python/chronos-forecasting/scripts/training/train.py", line 241, in __iter__
for element in self.base_dataset:
File "/mnt/e/python/chronos-forecasting/scripts/training/train.py", line 491, in __iter__
yield self.to_hf_format(next(iterators[idx]))
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/gluonts/transform/_base.py", line 111, in __iter__
yield from self.transformation(
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/gluonts/transform/_base.py", line 186, in __call__
for data_entry in data_it:
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/gluonts/transform/_base.py", line 186, in __call__
for data_entry in data_it:
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/gluonts/itertools.py", line 85, in __iter__
for el in self.iterable:
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/gluonts/dataset/common.py", line 424, in __call__
data = t(data)
File "/home/miniconda3/envs/chronos/lib/python3.10/site-packages/gluonts/dataset/common.py", line 291, in __call__
data[self.name] = _as_period(data[self.name], self.freq)
KeyError: 'start'
0%| | 0/200000 [00:00<?, ?it/s]
This key is not in the data.
@KeepFaithMe are you following the instructions in https://github.com/amazon-science/chronos-forecasting/tree/main/scripts#pretraining-and-fine-tuning-chronos-models?
I am actually facing same error. I first converted M4 hourly dataset into arrow and when creating kernel-synth dataset I set length to 'LENGTH = 748' since length of each serie in M4 is 748 not 1024.
I also converted M4-Hourly dataset into arrow with
# Convert the DataFrame to an Apache Arrow table
arrow_table = pa.Table.from_pandas(hourly_df)
# : Save the table to an Arrow file (Parquet format)
pq.write_table(arrow_table, 'm4_hourly_dataset.arrow')
I passed these two arrow files into config and used chronos-tiny model.
How can I fix the error and generally speaking can you please add tutorials for fine-tuning Chronos on common datasets (I know you used M4 dataset when training Chronos).
I am actually facing same error. I first converted M4 hourly dataset into arrow and when creating kernel-synth dataset I set length to 'LENGTH = 748' since length of each serie in M4 is 748 not 1024.
I also converted M4-Hourly dataset into arrow with
# Convert the DataFrame to an Apache Arrow table arrow_table = pa.Table.from_pandas(hourly_df) # : Save the table to an Arrow file (Parquet format) pq.write_table(arrow_table, 'm4_hourly_dataset.arrow')I passed these two arrow files into config and used chronos-tiny model.
How can I fix the error and generally speaking can you please add tutorials for fine-tuning Chronos on common datasets (I know you used M4 dataset when training Chronos).
I solved the problem by stacking each serie as list of numpy array.
Closing due to inactivity. Please feel free to re-open if you have further questions.
I would also like to know specifically how to utilize the “train-00000-of-00001.parquet” file.