autogluon
autogluon copied to clipboard
Getting ZeroDivisionError while fitting the Timeseries
Getting ZeroDivisionError while fitting the Timeseries Expected behavior Should Run Successfully To Reproduce
- Download the dataset
- create new column df["id_col"] = "default"
- Set DataValue as target
- Set YearEnd as timestamp
- Set id_column as id_col
- Give frequency as Y
- Run
Logs
Traceback (most recent call last): File "autogluon/timeseries/autogluon_timeseries_trainer.py", line 406, in train_model self.model.fit(training_df, time_limit=trainingHyperparameters['time_limit'], presets="medium_quality", File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/core/utils/decorators.py", line 31, in _call return f(*gargs, **gkwargs) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/predictor.py", line 742, in fit self._learner.fit( File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/learner.py", line 64, in fit return self._fit( File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/learner.py", line 87, in _fit train_data = self.feature_generator.fit_transform(train_data, data_frame_name="train_data") File "envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/utils/features.py", line 280, in fit_transform self.fit(data) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/utils/features.py", line 169, in fit past_covariates_df = self.past_covariates_pipeline.fit_transform(data[self.past_covariates_names]) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/utils/features.py", line 106, in fit_transform transformed = self._convert_numerical_columns_to_float(super().fit_transform(X, *args, **kwargs)) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/features/generators/pipeline.py", line 69, in fit_transform self._compute_post_memory_usage(X_out) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/features/generators/pipeline.py", line 130, in _compute_post_memory_usage self.post_memory_usage = get_approximate_df_mem_usage(X, sample_ratio=0.2).sum() File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/common/utils/pandas_utils.py", line 23, in inner return func(*args, **kwargs) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/common/utils/pandas_utils.py", line 49, in get_approximate_df_mem_usage sample_ratio_cat = num_categories_sample / num_categories ZeroDivisionError: division by zero
Installed Versions
autogluon==1.1.0
pandas==2.0.3
python3.8
@shchur sir please reply on this
Hi @AhangarAamir, can you please provide the code that reproduces this issue as well as the logs generated by the predictor?
I just encountered this error. I had a column id unique for each row using the TimeSeriesPredictor. It was fixed just dropping that column train = train.drop("id", axis=1).
@jakobhuss ok good, but removing id is not the solution
Hi @AhangarAamir, sorry for the slow response. I just had a look at the data and it seems that there is a problem with how the input TimeSeriesDataFrame is constructed in your example. When you set the id_col to the same value for all rows in the dataset, you are essentially telling AutoGluon that your data contains a single time series - even though the dataset contains many time series of different types. For the TimeSeriesPredictor to work correctly, you need to make sure that the value in the id_col is unique for each unique time series in your dataset.
I had a closer look at the data and it seems that there only few observations available in each time series (at most 8 observations). Here are the value counts for the YearEnd column:
| YearEnd | count |
|---|---|
| 2013-01-01 | 88244 |
| 2014-01-01 | 74702 |
| 2012-01-01 | 33960 |
| 2010-01-01 | 22765 |
| 2011-01-01 | 18023 |
| 2008-01-01 | 108 |
| 2001-01-01 | 104 |
| 2007-01-01 | 55 |
In such case, when only limited data is available, I would recommend to the TabularPredictor from autogluon.tabular since time series forecasting models likely won't be able to effectively work with very short time series.
@shchur Thank you for your reply , but TimeSeriesPredictor should not give ZeroDivisionError it should raise a proper exception with proper message
Thank you, that's a good point. Can you please share more details on how to reproduce the problem? I tried running the following code but the predictor trained normally:
import pandas as pd
from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor
df = pd.read_csv("US_disease_cleaned.csv")
df["id"] = "default"
tsdf = TimeSeriesDataFrame(df, timestamp_column="YearEnd", id_column="id")
predictor = TimeSeriesPredictor(target="DataValue", freq="Y", eval_metric="RMSE").fit(tsdf)
I've found an MWE for this issue
import pandas as pd
from autogluon.timeseries.utils.features import ContinuousAndCategoricalFeatureGenerator, TimeSeriesFeatureGenerator
df = pd.DataFrame(
{"item_id": ["A", "B", "C"], "feat": float("nan")}
)
df["feat"] = df["feat"].astype("category")
static_feature_pipeline = ContinuousAndCategoricalFeatureGenerator(minimum_cat_count=1)
static_feature_pipeline.fit_transform(df)
or using the public API
from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame
data = TimeSeriesDataFrame("https://autogluon.s3.amazonaws.com/datasets/timeseries/m4_hourly_subset/train.csv")
data.static_features = pd.DataFrame({"item_id": data.item_ids, "feat": None}).astype({"feat": "category"})
predictor = TimeSeriesPredictor().fit(data, hyperparameters={"Naive": {}})
The problem seems to lie in the get_approximate_df_mem_usage method. We need to make sure that number of categories is >= 1 (current num_categories=0 is possible).